Describe the numerical abundance of microbial life in relation to ecology and biogeochemistry of Earth systems.
The authors were mostly interested to know about growth and turnover rates of prokaryotic cells in different habitats, as well as carbon resources and their respective amounts in different habitats. In addition, the authors were also interested in N and P resources in the environment. The main questions being asked were: What is the total number of prokaryotes on Earth? Where are the main prokaryotic habitats and what is the estimated abundance of prokaryotes in different reservoirs (open ocean, soil, and in oceanic and terrestrial subsurfaces)? How much carbon is being produced by these prokaryotes? What is the total nutrient content present in prokaryotes (N and P)? How do prokaryotes affect the carbon cycle?
Soil: They used direct counts from coniferous forest soil (ultisol) and previous studies to calculate the cellular density in the soil. In order to calculate the total number of prokaryotic cells in the soil, they combined the value for the amount of soil on earth, which was taken from the literature, with the values found for cellular density.
Aquatics Environments (Oceanic reservoirs, polar region, and freshwater/saline lakes): In order to determine the number of prokaryotic cells, they multiplied the average, estimated cellular density in aquatic environments with the estimated amount of marine and fresh water quantities (Volume) on Earth from the literature. For polar regions, they relied on the mean number of prokaryotes that was published by Delille & Rosiers (literature) and the mean area extent of seasonal ice.
Terrestrial subsurface:The estimated value for the quantity (volume) of ground water on Earth from the literature was multiplied by the estimated prokaryotic numbers that they had found from ground water based on the values of several sites. In addition, they also calculated the number of prokaryotes in terrestrial subsurface by using the average porosity of the terrestrial subsurface soil, the total pore space occupied by prokaryotes in these pores, and the volume of the upper 4km of the terrestrial subsurface (with the first two being assumptions).
Carbon Content and production: Estimated the amount of carbon in prokaryotes based on the number of prokaryotic cells in soil, aquatic system, and the subsurface. In soil and subsurface, it has been assumed that one-half of the dry weight is the cellular carbon. In addition, they assumed that the amount of carbon produced by prokaryotes during each turnover is about four times their carbon content, therefore, using this information and the turn over rates, they calculated the production of prokaryotic carbon. For aquatic systems, they assumed that the average cellular carbon to be 5 fg of C/cell and multiplied this by the number of prokaryotic cells in the aquatic system to obtain the total amount of prokaryotic cellular carbon.
-The total number of prokaryotes and the amount of cellular carbon on associated with prokaryotes were estimated to be 4-6x10^30 cells and 350-550 Pg of C, respectively. -The total amount of prokaryotic carbon is 60-100% of the estimated total carbon in plants. -Prokaryotes contain large amounts of N (85-130 Pg) and P (9-14 Pg): ~10-fold more than these found in plants. Therefore, prokaryotes represent the largest pool of nutrients in living organisms. -Prokaryotes are mostly populated in open ocean, soil, and in oceanic and terrestrial subsurfaces with the number of cells being 1.2x10^29, 2.6x10^29, 3.5x10^30, and 0.25-2.5x10^30, respectively.
-They estimated the cellular production rate of prokaryotes on Earth to be 1.7x10^30 cells/yr and they found that cellular production is the highest in open ocean. Lastly, the high mutation rates, large population size and rapid growth of prokaryotes results in genetic diversity.
-Why is the residence time higher than expected for subsurface? -How do we define species and their phylogenetic relationships? -Were the calculations done properly and are the results valid since a lot of assumptions were made in the process of calculation? -Given the high abundance of prokaryotes in certain environments, what is their role in the total metabolic potential of the ecosystem?
Why was the cellular rate of production highest in open ocean? Why was the estimated average prokaryotic turnover time lower than expected?
Describe the numerical abundance of microbial life in relation to the ecology and biogeochemistry of Earth systems.
The primary prokaryotic habitats on Earth are aquatic (seawater) soil, and marine sediment/soil subsurface. The total number of prokaryotes in soil is 255.6x10^27 cells, 1.181x10^29 cells in aquatic habitats (mostly found in the upper 200m of the open ocean), and 3.8x10^30 cells in subsurface sediments.
The estimated prokaryotic cell abundance in the upper 200 m of the ocean is 3.6x10^28 cells at cellular density of 5x10^5 cells/mL. In this habitat, the average cellular density of the autotrophic marine cyanobacterium (including Prochlorococcus) is 4x10^4 cells/mL, which would make 8% of the biomass in the upper 200m of the ocean that supports carbon availability for the rest of the 92% heterotrophs present.
Calculation: (4x10^4 cells/ml)/(5x10^5 cells/ml)x100=8%
According to the article, autotrophs are organisms that use the substances around them to make complex organic compounds (they fix inorganic carbon e.x co2-> biomass). Therefore, these organisms produce their own food.
Heterotrophs: Assimilate organic matter to produce energy (produce carbon from organic sources)
Lithotrophs: assimilate and metabolize inorganic matter and release energy (obtain electrons from inorganic sources)
The deepest habitat capable of supporting prokaryotic life is terrestrial subsurface sediments at the depth of 4000m (4km), with the primary limiting factor being the temperature with the average of 125C, which is close to the upper temperature limit for prokaryotic life (every one km is about 22C difference). The deepest point on earth where life can exists is the Mariana Trench, which is about 10.9km deep. You can go an extra 4-5 km where prokaryotic cells reside. The limiting factor for this depth is temperature.
The highest point on Earth where life exists is the Mount Everest, which is about 8.8 km above sea level. However, the highest habitat capable of supporting prokaryotic life is the atmosphere where prokaryotes are present as high as 57-77 km (however a more realistic boundary would be around 20km above mount Everest).The limiting factors include presence of different percentage of gasses as an example lower oxygen percentage in higher altitudes as well as different gas pressures. The lack of moisture and nutrients as well as high ionizing radiation. In addition, temperature could be another limiting factor in this habitat. Therefore, the extent at which life exists on Earth is around 24km.
The extent at which life exists on Earth is around 24km. The biosphere is the part of the Earth where life exists, with a vertical distance of 24 km. This is from Mount Everest (8.8km above sea level) to the bottom of Mariana’s Trench (10.9 km deep), with an addition 4km added.
Annual cellular production= (Population size)/(turnover time per year)(multiplied by the number of days in a year)= cells/year
Sample calculation: (3.6x10^28 cells*365/ 16 turnovers)= 8.2x10^29 cells/ year
-Carbon content in a prokaryotic cells= ~20 fg (assuming 5-20 fg of C/prokaryotic cell)= (20)(10^-30) Pg/cell -Carbon assimilation efficiency: 0.2 (20%) converting inorganic carbon into biomass -Number of cells= 3.6x 10^28 cells -(3.6x1028cells)(20x10-30 Pg/cell)=0.72 Pg C in marine heterotrophs (0.72 Pg C)(4)= 2.88 Pg/year 51 Pg c/year -> 85% consumed = 43 Pg C consumed per year (43 Pg C/year)/(2.88 Pg/year)=14.9 1 turn over every 24.5 days
A large population will have a higher carbon content due as well as a longer turnover as the result. Therefore, carbon content is dependent on turnover rate and population size assuming carbon assimilation efficiency remains constant. These values vary due to the amount of exposure to sunlight and the lytic bacteriophages that are largely present in the upper 200m of the ocean.
One of the factors that result in genetic diversity and formation of novel species is mutations. In addition to mutations, Horizontal Gene Transfer (HGT): transformation, transduction, and conjugation are responsible for the genetic diversity seen in prokaryotes. Prokaryotic cells who have favarouble mutations, will adopt better to their sorounding environment and survive better.
Large prokaryotic cells population allows for rare events to occur more frequently in nature. For example, prokaryotes have enormous potential to acquire genetic diversity by accumulating mutations, however, growth rates must also be taken into account when measuring potential mutational changes in a population. Large, slow growing populations may produce fewer cells and fewer mutations compared to small, fast growing populations. Therefore, if we have high abundance, high growth rate and therefore rapid replication, then we have high mutation rates, which increases the diversity of the population. Increase in the diversity of population increases metabolic potential in response to selective pressure and stress.
Hadean 4.6 GA: Solar system formed (30% less luminous than today, inner planets received water vapour (Increase in Pvapour= 500 C) and carbon (Increase in carbon dioxide (CO2)=limestone) 4.5 GA: Moon was formed, which upon formartion spun and tilted the Earth , formation of day/night cycles and seasons 4.5 GA – 4.1 GA: High levels of CO2 increased temperature 4.4 GA: Zircon formation: oldest mineral 4.4 GA – 4,1 GA: meteorite impacts 4.1 GA: Evidence of life in zircon from carbon isotopes 4 GA: Oldest rock: Acastagneiss and evidence of plate subduction
Archaean 3.8 GA: meteorite bombardment halted-> sea water chemistry stabilized, sedimentary rocks-> possible existence of life? (carbon isotopes again), sulphur reduction (processing), Rubisco!!!! (for carbon fixation) 3.7 Ga: methanogenesis (early)-> Green house CH4: CO2 3.5 GA: Evidence of life-> Photosynthesis (microfossils)+ stromatolites (bacterial aggregation) 3.5 GA – 2.7GA: Cyanobacteria photosynthesize 2.7 GA: Great oxidation event: responsible for glaciation(hout the presence of methanogenesis, the Earth would have been frozen during the Archean era and more interestingly, if the Earth has been oxygenated before this time, Earth would have been frozen cause methanogens do not like O2 )+emergence of eukaryotes at the end of Archean-> -life on land
Proterozoic 2.5 GA – 1.5 GA (Proterozoic I): red rock beds observed: evidence of oxidation, oxygen levels increase sharply: microaerobic early atmosphere-> oxic air (complex Eukaryotes involved?), cellular cybernetic switch between mitochondria and chloroplasts-> may control link between photosynthesis CO2 and nitrogen fixation 1.8-1.96 Ga: Cyanobacteria-> emergence of eukaryotes +algea 1.1 GA: Snowball Earth occurs (glaciation)
Phanerozoic 540 MA-250 MA: Paleozoic: -Expansion of multi-cellular evolution-> Cambrian explosion: increased diversity of life and larger organisms and land plants were observed -increase oxygenation of the atmosphere -Carboniferous period: fish, cephalopods, and corals -Clevonian explosion -emergence of woody land plants about 400 m years ago
-Formation of Pangea-> dry, harsh, climate in Pangea’s interior -Permian extinction: ~95% of species gone extinct -rapid speciations between Paleozoic and Mesozoic
250 MA-65 MA: Mesozoic -rise of dinosaurs (gigantism) -Atlantic ocean -Cretaceous-Tertiary extinction event -Immediately post extinction-> nothing over 10kg on land
65 MA-0 MACenozoic: -Dramatic global warming -Mammals diversification -grass start -Ice age
Today -Quaternary period - 200,000 homo sapiens first appear
Hadean There was a massive amount of greenhouse CO2 to keep the Earth warm, as the sun was 30% less luminous than today. The Earth was spun and titled as a result of the formation of moon and gave us the present day-night cycles and the seasons. The Earth suffered massive meteorite impacts. Earth was mostly molten rock and very hot
Archaean Atmosphere was filled with CH4 produced through methanogenesis by methanogens to still keep the Earth warm. As photosynthesis evolved and cyanobacteria photosynthesized, some O2 become present on Earth.
Proterozoic O2 reacted with atmospheric methane to produce CO2, which led to a net decrease in greenhouse gas effects, making the Earth cold and resulting in glaciation (Snow ball Earth). Oxygen on Earth started oxidizing iron into banded iron formations, seen in sedimentary rock.
Phanerozoic Increased diversity of life and larger organisms and land plants were observed. Coal deposits developed as organisms died in permian extinctions and were stored in sediments.There was also the occasional glaciation periods.
Discuss the role of microbial diversity and formation of coupled metabolism in driving global biogeochemical cycles.
The primary geophysical processes that create and sustain conditions for life on Earth are tectonics and atmospheric photochemical processes that continuously supply substrates and remove products, which results in creation of geochemical cycles. Both of these geophysical processes allow elements and molecules to interact with each other, and chemicals bonds to form and break. Furthermore, the biogeochemical processes that create and sustain conditions for life on Earth are acid-base and redox reactions. The biotic redox reactions depend on less external energy compared to the acid-base abiotic reactions. In addition, abiotic process such as volcanism and rock weathering are very important processes for re-supply of C, S, and P. The abiotic acid-base chemistry is the transfer of protons without electrons. However, the biotic redox reactions involve successive transfers of electrons and protons and are responsible for more of the major elements C, H, N, O, and S. Feedbacks between the microbial metabolisms and geochemical processes create the average redox condition of the oceans and atmosphere. The oxidation of Earth is driven by photosynthesis, which is not directly dependent on preformed bond energy. The way that abiotic and biotic processes are interconnected is by the nested abiotically driven acid-base reactions that are created by biogeochemical cycles on a planetary scale as well as biologically driven redox reactions that set lower limits on external energy.
Feedbacks between the microbial metabolisms and geochemical processes create the average redox condition of the oceans and atmosphere. Therefore, the Earth’s redox state is considered an emergent property of microbial life on a planetary scale. The oxidation of Earth is driven by photosynthesis, which is not directly dependent on preformed bond energy.
An example of a reversible reaction is the conversion of inert N2 to NH4+ through nitrogen fixation, which is a biologically irreversible reaction and this is the only process that makes N2 accessible for the synthesis of proteins and nucleic acid. This reaction is catalyzed by an enzyme called nitrogenase which is inhibited by oxygen.In addition, microorganisms carry genes that encode for the machinery involved in redox half-cells which is for energy transducting pathways. In order to overcome thermodynamic’s barriers, microbes use identical or near-identical pathways for forward and reverse reactions. Given the concentrations of the substrates become very low, reverse reactions can become possible. Since reactions tend to favour equilibrium, if the concentrations of the substrates are low, the equilibrium will favour shifting towards the substrates. In addition to equilibrium of reactions, metabolic relationships and interactions between microorganisms would make the reversible reactions possible. This would be as one organism provides energy or a metabolite where it could be used by another organism to either perform the opposite reaction or create an environment where the reverse reaction is favourable.
In order for N2 to be accessible for the synthesis of proteins and nucleic acids in organisms, it has to converted to NH4+ through nitrogen fixation which is catalyzed by an enzyme called nitrogenase that is inhibited in the presence of oxygen. However, in the presence of oxygen, NH4+ is oxidized to nitrate in a two-stage pathway. In the first step, a group of bacteria or archea is required to oxidize ammonia to NO2- and in the second step, NO2- is oxidized to NO3- by a different nitrifying bacteria. The nitrifiers perform this reaction to produce CO2 into organic matter by using the small differences in redox potential in the redox reactions. In the absence of oxygen, another set of opportunistic microbes use NO2- and NO3- as electron acceptor in anaerobic oxidation of organic matter, which will ultimately lead to the formation of N2, which closes the N-cycle. The independent electron pool of the N-cycle is influenced by the availability of organic matter and the production of oxygen through photosynthesis. It is important to notes that that climates change affects sunlight availability which in turn affects photosynthesis since sunlight is the main source of energy in photosynthesis. Photosynthesizers that use nitrogen oxides as electron acceptors will be affected due to changes in the sunlights. On the other hand, the N-cycle could also influence the climate change, which is because nitrifying organisms may use NH4+ or NO2- for reduction of CO2, which reduces green house gasses.
Through time, metabolic pathways have evolved to make use of the available substrates that are the end products of other microbial metabolism. The series of half redox reactions of a given elemental cycle such as nitrogen are distributed between different organisms, which allows this interaction between a microbial community with each organism having a specialized role in the community. When more metabolic diversity is present, organisms take on special roles in different pathways that require the help of other organisms, which leads to microbial diversity. In addition, horizontal gene transfer not only make it possible for microbes of different species to transfer genes, but it will also allow for an entire metabolic pathways to be transferred between different species to other species.
The number of protein families within individual Bacterial and Archaeal genomes have a linear relationship with the number of newly discovered genes per genome. Genome size appears to correlate with evolutionary rate, but not metabolic processes. Diverse organisms live in diverse environments, which all lead to different, adapted genes that produce proteins required to survive in those specific conditions. Thus with more microbial diversity, more diverse metabolic pathways are required for organisms to live in their existing environment which results in a condition where new proteins are expressed to repair or replace the existing performing proteins.
Evaluate human impacts on the ecology and biogeochemistry of Earth systems.
What are the boundaries that define safe operating space for humanity? What are the Earth-system processes and their corresponding thresholds that could generate unacceptable environmental change if crossed? What parameters play role in the boundaries set for each process? How are the boundaries for different processes coupled and connected?
Planetary boundaries: Planetary boundaries are values for control variables. They are either at a safe distance from the thresholds or at dangerous levels. The authors tool a conservative risk-averse approach to quantify the planetary boundaries by taking into account the large uncertainties that surround the many thresholds’ true positions.
For setting boundaries, the proposed values and ideas were based on connecting the present literatures on the topic such as carbon dioxide levels in the atmosphere, fossil records about extinction rates.
Nine processes and their corresponding thresholds were defined, for which if the threshold is crossed, unacceptable environmental changes are generated. These 9 processes that need defined planetary boundaries are: climate change, rate of biodiversity loss, interference with the nitrogen and phosphorus cycles, stratospheric ozone depletion, ocean acidification, global fresh water use, change in land use, chemical pollution, and atmospheric aerosol loading.
Boundaries for global freshwater us, change in land use, ocean acidification, and interference with the global phosphorous cycle may be soon approached by humanity. In addition, 3 of the Earth-system processes: climate change, rate of biodiversity loss, and interference with nitrogen cycle have already passed and broken their boundaries.
Human changes to atmospheric CO2 concentrations should not exceed 350 parts per million by volume and the radiative forcing should not exceed 1 watt per square meter above pre-industrial levels.
Setting a planetary boundary for biodiversity loss is difficult because science cannot yet provide information required for setting these boundaries such as how much and what kids of biodiversity can be lost before the resilience of ecosystem is eroded, therefore, more research is required.
The boundary for nitrogen cycle considers human fixation of nitrogen as a big “valve” that controls massive flow of new reactive nitrogen into Earth. The valve contains the flow of new reactive nitrogen to 25% of its current values or about 35 million tonnes of nitrogen per year.
The proposed model by the authors suggests that anoxic ocean events become more likely within 1000 years given that there is a greater than tenfold increase in phosphorus flowing in the oceans.
As long as the thresholds are not crossed, long-term social and economical development can be implemented and pursed by humans.
How reliable are the proposed models for setting boundaries? How long does it take for each of these threshold values to be crossed for each process given the same human actions continues?
“Microbial life can easily live without us; we, however, cannot survive without the global catalysis and environmental transformations it provides.”
Microorganisms are responsible for today’s life on Earth. If microorganisms did not exists as the first habitants, today’s Earth would have been frozen and free of oxygen, where humans were unable to adapt and survive. On the other hand, microorganisms were able to adapt and survive because they did in the first place when no life existed on planet Earth and conditions were extreme. Therefore, microbial life can live without humans; however, humans cannot survive without microbes’ global catalysis and their impacts on different environments. In order to address this topic, it is important to discuss humans’ dependency on the reactions performed by microbes, microbial life’s roles in humans’ health, and microbe’s great genetic diversity that allows for high adaptability in different environments.
Firstly, humans are both directly and indirectly dependent on the reactions performed by microbes. Two examples of these reactions are the production of almost all the oxygen we breath through the process of photosynthesis and the conversion of elemental nitrogen into a usable form to be utilized by plants through various oxidation and reduction reactions that drive Earth’s nitrogen cycle, with the latter being an indirect dependency of humans on microbial reactions (1). Higher plants are responsible for most of the photosynthesis that happens on land; however, terrestrial photosynthesis is balanced by the reverse reaction of respiration and decay (2). Therefore, it has no impact on the atmospheric oxygen levels used by humans (2). On the other hand, a small leak in the marine organic carbon cycle results in about 0.1% of the synthesized organic matter from photosynthesis by single celled organisms to be buried into the sediments (2). This results in a net source of oxygen that is most of our atmospheric oxygen (2). Another reaction performed by microbes through multispecies microbial interaction that humans are indirectly dependent on is the conversion of the inert elemental nitrogen gas, with an atmospheric resistance time of about 1 billion years, to a usable form that can be used by plants for nucleic acid and protein synthesis (1). Humans then use the plants and the animals that use these plants. The reductive, irreversible process of converting elemental nitrogen to NH4+ is catalyzed by nitrogenase- a conserved enzyme complex that is inhibited in the presence of oxygen (1). NH4+ can then be oxidized to nitrate in the presence of oxygen through a two stage process with first step being the oxidation of ammonia to NO2− by a particular group of Bacteria or Achaea and the second step involving oxidization of NO2−to NO3− by a different nitrifying bacteria (1). Finally, opportunistic microbes use NO2− and NO3− as electron acceptors in the absence of oxygen in oxidation of organic matter, which ultimately leads to the formation of N2, and completion of the nitrogen cycle (1). Therefore, it is for these different microbial species and their interactions that drive both the production of oxygen on Earth as well as a series of coupled redox reactions to complete the nitrogen cycle.
Secondly, trillions of microbes have lived and evolved on and within the human beings, protected humans from pathogens and have developed symbiotic relationship with the humans. To further expand on the role of microbes on the humans’ body and health, the human gut microbiota will be discussed. The human gut microbiota is considered a separate organ by itself because of its great metabolic capability and functional plasticity. The gut microbiota is involved in some human biological processes such as regulating and adjusting the metabolic phenotype, protecting against foreign pathogens, and development of innate immunity (3). To address modulation of metabolic phenotype by the gut microbiota, it is important to note that by using specific enzymes and biochemical pathways encoded by genes that are not found in human genome, the gut microbiota has the potential to increase energy and nutrient extraction from food, as well as, altering appetite signals (3). Furthermore, microbes in the gut are involved in metabolism of indigested carbohydrates as well as vitamin biosynthesis such as vitamin K, which is a required co-factor for the production of blood clotting factors (4). The absence of the Vitamin K producing microbes by antibiotic treatment can result in excessive bleeding upon disruption of the outer epithelial layer. In addition, intestinal flora’s ability to metabolize indigestible carbohydrates to short-fatty acids, allow humans to digest plant-based foods, which would be impossible without the absence of these microbes (4). Furthermore, the human gut microbiota acts as a physical barrier by production of antimicrobials and “competitive exclusion” to protect the host from foreign pathogens (3). Lastly, the gut microbiota is essential in development of intestinal mucosa and immune system of the host usually by promoting maturation of immune cells or development of the immune system. An example of this is the development of innate immunity by the interaction between TLR-4 on immune cells of myeloid lineage and activation of an intracellular signaling network leading to cytokine and chemokine production (5). Therefore, the microbes that reside on and in humans play a major role both metabolically and in shaping the immune system. Therefore, absence of these microbes would cause not only metabolic constraints, but would also put the human body in an immune-compromised state, possibly resulting in the development of various diseases.
Lastly, it is important to note that microbes are independent from humans and do not rely nor need humans for their survival. Despite all the symbiotic relationships that are present today between microorganisms and humans such as the previously mentioned human intestine, microorganisms are able to evolve and adapt to any condition as they have been on the ever-changing planet Earth (6). The number of prokaryotic cells on Earth is calculated to be 4-6 x10^ 30 cells with the slowest turnover time being 2.5 year in soil (7). The large population size of prokaryotes and their rapid growth and turnover rates provides potential for significant genetic diversity that allows these microorganisms to evolve to fit and adopt to all types of environments as has been shown throughout the Earth’s history. Microbes have survived extreme conditions such as hot ocean bottlenecks where only thermophiles could survive (6). Environmental selections lead to the evolution of the boutique genes that protect the metabolic pathway used by microbes, therefore, if one unit goes extinct, the core metabolic pathway survives in another unit that has received the core planetary gene sets either thorough vertical or horizontal gene transfer (1). Therefore, microbes act as vessels that carry metabolic machines through long periods of times, different geological places, and extreme environmental disturbances (1). An example of adaptation and evolution of microbes is seen in the process of nitrogen fixation by cyanobacteria using the nitrogenase enzyme. The nitrogenase enzyme is poisoned by oxygen; however, cyanobacteria have evolved complex mechanisms for protecting this enzyme. A few examples of these mechanisms highlighting the highly evolved machinery used by cyanobacteria are only fixing nitrogen in specialized cells called heterocysts, fixing nitrogen at night and photosynthesizing by day, and fixing nitrogen in the morning and photosynthesizing in the afternoon (2).
In conclusion, humans are dependent on the reactions performed by microorganisms as well as their roles in human survival and health. They may be able to survive without humans for a few days; however, in the absence of microbes the Earth becomes an inhabitable place for humans at a very fast-pace. On the other hand, microorganisms are fast evolving living organisms that can survive and adapt to every type of environment without depending on humans. Therefore, microorganisms are necessary for human life on Earth and their presence and performance provide the conditions for a habitable Earth.
References:
Falkowski PG, Fenchel T, Delong EF. 2008. The Microbial Engines That Drive Earths Biogeochemical Cycles. Science 320:1034-1039.
Kasting JF, Siefert JL. 2002. Life and Evolution of Earth’s Atmosphere. Life and Evolution of Earth’s Atmosphere 296:1066-1068.
Achenbach J. 2012. Spaceship Earth: A new view of environmentalism. The Washington Post 1–4.
Canfield DE, Glazer AN, Falkowski PG. 2010. The Evolution and Future of Earth’s Nitrogen Cycle. Science 330:192–196.
Rockstrom J et al. 2009. A safe operating space for humanity. Nature 461:472–475.
Nisbet EG, Sleep NH. 2001. The habitat and nature of early life. Nature 409:1083-1091.
Whitman WB, Coleman DC, Wiebe WJ. 1998. Prokaryotes: The unseen majority. Proceedings of the National Academy of Sciences 95:6578-6583.
Falkwoski P et al. 2000. The Global Carbon Cycle: A Test of Our Knowledge of Earth as a System. Science 290:291–296.
Waters CN et al. 2016. The Anthropocene is functionally and stratigraphically distinct from the Holocene. Science 351:137–147.
Falkowski PG, Fenchel T, Delong EF. 2008. The Microbial Engines That Drive Earths Biogeochemical Cycles. Science 320:1034-1039. PMID18497287
Kasting JF, and Siefert JL. 2002. Life and the evolution of Earth’s atmosphere. Science. 296(5570):1066-1068. PMID12004117
Leopold A. 1949. The Land Ethic. In A Sand County Almanac. Oxford University Press. London.
Zehnder A.J.B. 1988. Biology of Anaerobic Microorganisms.
Nisbet EG, Sleep NH. 2001. The habitat and nature of early life. Nature 409:1083-1091. (https://www.nature.com/articles/35059210)
Whitman WB, Coleman DC, Wiebe WJ. 1998. Prokaryotes: The unseen majority. Proceedings of the National Academy of Sciences 95:6578-6583. PMC33863
Waters CN. 2016. The Anthropocene is functionally and stratigraphically distinct from the Holocene. Science 351:137–147. PMID26744408
Schrag DP. 2012. Geobiology of the Anthropocene. Fundamentals of Geobiology 425–436. (https://onlinelibrary.wiley.com/doi/10.1002/9781118280874.ch22)
Kallmeyer J, Pockalny R, Adhikari RR, Smith DC, and D’Hondt S. 2012. Global distribution of microbial abundance and biomass in subseafloor sediment. Proc Natl Acad Sci USA. 109(40):16213-16216. PMID22927371
Mooney C. 2016. Scientists say humans have now brought on an entirely new geologic epoch. The Washington Post 1–5. (https://www.washingtonpost.com/news/energy-environment/wp/2016/01/07/scientists-say-humans-have-now-brought-on-an-entirely-new-geologic-epoch/?utm_term=.a25428157ae9)
Rockstrom J et al. 2009. A safe operating space for humanity. Nature 461, 472–475. (https://www.nature.com/articles/461472a)
Achenbach J. 2012. Spaceship Earth: A new view of environmentalism. The Washington Post. WP Company. (www.washingtonpost.com/national/health-science/spaceship-earth-a-new-view-of-environmentalism/2011/12/29/gIQAZhH6WP_story.html)
Canfield DE, Glazer AN, Falkowski PG. 2010. The Evolution and Future of Earth’s Nitrogen Cycle. Science 330:192–196. PMID20929768
Falkowski P, Scholes RJ, Boyle E, Canadell J, Canfield D, Elser J, Gruber N, Hibbard K, Högberg P, Linder S, Mackenzie FT, Moore B 3rd, Pedersen T, Rosenthal Y, Seitzinger S, Smetacek V, and Steffen W. 2000. The Global Carbon Cycle: A Test of Our Knowledge of Earth as a System. Science 290:291–296. PMID11030643
Specific emphasis should be placed on the process used to find the answer. Be as comprehensive as possible e.g. provide URLs for web sources, literature citations, etc.
(Reminders for how to format links, etc in RMarkdown are in the RMarkdown Cheat Sheets)
Solden L, Lloyd K, Wrighton K. 2016. The bright side of microbial dark matter: lessons learned from the uncultivated majority. Current Opinion in Microbiology 31:217–226. PMID27196505
Schloss PD, Girard RA, Martin T, Edwards J, Thrash C. Status of the Archaeal and Bacterial Census: an Update. mBio 7:e00201-16 PMC4895100
Up to 2016, 89 bacterial phyla and 20 archaeal phyla had been recognized using small 16s rRNA databases. However, the true phyla count is much higher, up to 15000 bacterial Phyla, and this is because many microbes live in life “shadow biosphere”. However, in 2006 a study claimed that 24 of the 65 identified bacterial phyla back then had no cultured representative and 14 of the 20 archaeal phyla have no cultured representatives.
https://www.ebi.ac.uk/metagenomics/
Thousands- 110,217 on EB database, which account for only small fraction of projects that are on- going. The types of environments where the sequences are sourced from include: all- soil, aquatics, sediments, host associated (humans, mammals, and plants).
NCBI: https://www.ncbi.nlm.nih.gov MG-RAST: https://www.mg-rast.org Analysis pipelines: Megan 5: http://ab.inf.uni-tuebingen.de/software/megan5/ Annotation-KEGG: http://www.genome.jp/kegg/annotation/ Binning: s-GCOM Assembly:Euler: https://omictools.com/euler-sr-tool Ing/M
Krause L, Diaz NN, Goesmann A, Kelley S, Nattkemper TW, Rohwer F, Edwards RA, Stoye J. 2008. Phylogenetic classification of short environmental DNA fragments. Nucleic Acids Research 36:2230–2239. PMC2367736
Phylogenetic: vertical gene transfer, carries phylogenic information, allows for tree reconstruction, taxonomic and ideally single copy.
Functional: more horizontal gene, identify specific biogeochemical functions associated with measurable effects, not as useful as phylogeny.
Teeling H, Glockner FO. 2012. Current opportunities and challenges in microbial metagenome analysis–a bioinformatic perspective. Briefings in Bioinformatics 13:728–742. PMC3504927
Wooley JC, Godzik A, Friedberg I. 2010. A Primer on Metagenomics. PLoS Computational Biology 6. (http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1000667)
Metagenomic sequence binning is associating sequence data with the OTU of its origin to learn about what the different OTUs are doing. In other words, placing the sequence in its correct “bin” or OTU is binning (Wooley et al.). Binning approaches work without refrence sequences and cluster sequences based on compositional characteristics (Teeling & Glockner). The algorithmic approaches used to produce sequence bins are composition-based binning and phylogenic binning. In composition-based binning, the GC content of bacterial genomes is used for higher-level systematics. A program used for this type if binning in the TETRA program. Another composition-based method is codon usage, where ORF sequences are classified based on the codon frequencies used by different species to encode the same amino acids. In phylogenic binning or similarity-based binning, similarities are found to reference sequences that can be then used to build a tree. MEGAN uses this method by reading a BLAST fie output. CARMA is also similar to MEGAN, however, it uses Pfam as its source of taxonomic classification (Wooley et al.).
The risks and opportunities
• Phylogenic marker genes are sparse, therefore, they allow only taxonomic assignment of a minor portion of the sequences: incomplete coverage. • Contamination from different phylogeny (what is considered as contamination?). • Binning methods can be used prior to assembly of taxonomic classification to partition reads into taxonomic binds. This results in significant reduction of the complexity of metagenome assemblies.
Gawad C, Koh W, Quake SR. 2016. Single-cell genome sequencing: current state of the science. Nature Reviews Genetics 17:175–188. PMID26806412
Wang Y, Navin NE. 2015. Advances and Applications of Single-Cell Sequencing Technologies. Molecular Cell 58:598–609. PMC4441954
Alternatives to metagenomic shotgun sequencing are FUNCTIONAL SCREENS (BIOCHEMICAL ETC. ), 3RD GENE SEQUENCING (NANOPRONE), single cell sequencing, and FISH probe. Some of the risks and challenges associated with single cell sequencing are efficient physical isolation of individual cells, obtaining sufficient data for analysis by sufficient amplification of a single cell’s genoms, querying the genome in a cost-effective way, and interpreting the data within the context of biases and errors that are introduced in the previous steps. The opportunities associated with single cell sequencing are providing insight into our understanding of rare cells specifically cancer cells since tumours evolve from single normal cells.
Discuss the relationship between microbial community structure and metabolic diversity
Evaluate common methods for studying the diversity of microbial communities
Recognize basic design elements in metagenomic workflows
What is the physiological basis of light activated growth stimulation in PR-containing marine bacteria? What is the function of each gene in the photosystem biosynthetic pathway?
• To characterize PR photosystem genetics and biochemistry, marine picoplankton large-insert genomic library for recombinant clones expressing PR photosystems in vivo were surveyed.
• In order to verify functional annotation of each gene product in intact PR-based photosystems biosynthetic pathways, insertional mutants were analyzed using cell pigmentation and HPCL: genetic and biochemical analysis of transposon mutants were used to verify the function of gene products in the photopigment and opsin biosynthetic pathways.
• Luciferase-based assay was used to measured light-induced changes in ATP levels in the PR-photosystem-containing clones and PR- mutant derivatives.
• Light activated proton-translocation activity was assays for cells grown under high-copy number conditions.
• The screening process exploited transient increase in vector copy number was, which significantly enhanced the sensitivity of the phenotypic detection. In addition, two genetically distinct recombinants that were initially identified by their orange pigmentation were expressing a small cluster of genes encoding a complete PR-based photosystem.
• A fully functional PR photosystem than enabled photophosphorylation in recombinant E. coli cells was generated upon heterologous expression of six genes.
• Phototrophic capabilities in a chemoorganotrophic microorganism can be gained as a result of a single genetic event, which explains the presence of PR photosystems among diverse microbial taxa.
• Spectro-toning: the genes for the proton pump has evolved to respond to the amount of light that gets to the level of the ocean they reside in.
• Increasing fosmid copy number can significantly enhance detectable levels of recombinant gene expression and therefore increases the detection rate of desired phenotypes in metagenomic libraries.
• Results from cell pigmentation and HPLC supported the “functional assignments” of genes associated with PR biosynthetic pathways and demonstrates that these genes are both necessary and sufficient to induce retinal biosynthesis in E. coli cells.
• Decrease in pH was observed during proton-tanslocating assay in PR+ clones but not in mutants containing transposon inserts in the PR gene.
• ATP measurements showed significant light-induced increase in cellular ATP of PR+ clones but not in cells lacking the PR- mutants.
Are these results consistent and producible in other bacterial cells beside E. coli?
Madsen EL. 2005. Identifying microorganisms responsible for ecologically significant biogeochemical processes. Nature Reviews Microbiology 3:439–446. PMID15864265
Martinez A, Bradley AS, Waldbauer JR, Summons RE, Delong EF. 2007. Proteorhodopsin photosystem gene expression enables photophosphorylation in a heterologous host. Proceedings of the National Academy of Sciences 104:5590–5595. PMC1838496
Taupp M, Mewis K, Hallam SJ. 2011. The art and design of functional metagenomic screens. Current Opinion in Biotechnology 22:465–472. PMID 21440432
Wooley JC, Godzik A, Friedberg I. 2010. A Primer on Metagenomics. Proceedings of the National Academy of Sciences Computational Biology 6: 1-16. (http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1000667)
• Evaluate the concept of microbial species based on environmental surveys and cultivation studies.
• Explain the relationship between microdiversity, genomic diversity and metabolic potential
• Comment on the forces mediating divergence and cohesion in natural microbial communities
What are the genomic differences between the CFT073, enterohemorrhagic E. coli EDL933, and laboratory strain MG1655 in terms of pathogenicity and evolutionary diversity?
The genomic material was isolated from the bacteria of interest, a genomic DNA library was prepared, and sequencing methods such as PCR-based techniques and primer walking were used. The genome sequences were then annotated in a web-based annotation environment called MAGPIE. BLAST was used to predict proteins. Orthology was inferred when matches for CFT073 genes in either the MG1655 or EDL933 database exceeded Identity 90% or higher identity.
• The complete genome sequence for uropathogenic E. coli, strain CFT073.
• A three-way genome comparison revealed that only 39.2% of the combined set of proteins is common between the three strains studied.
• The absence of genes for type III secretion system or phage and plasmid-encoded toxins found in the diarrheagenic E. coli causes the difference in the disease potential between O157:H7 and CFT073.
• The genome of CFT073 genome is rich in genes that are responsible for encoding fimbrial adhesins, autotransporters, iron-sequestration systems, and phase-switch recombinases.
• Significant differences exists between the pathogenicity islands of CFT073 in comparison to the other two strains studies.
• Many islands are acquired by different horizontal transfer events.
• How do the deletions of the islands responsible for the pathogenicity of the strains impact the bacteria?
• What is species? How do we define species in microbes if they only share 39.2%?
• Comment on the creative tension between gene loss, duplication and acquisition as it relates to microbial genome evolution
• Identify common molecular signatures used to infer genomic identity and cohesion
• Differentiate between mobile elements and different modes of gene transfer
Depending on the environment the E. coli reside in, it will have different islands that provide them with different advantages. Ecotype diversity is explained as the different strains of a species with specific traits to allow them to reside in different parts of the body (environments). For example, in the urinary tract, genes that allow for adherence to the urinary tract would allow them to stay in the tract and not get washed away. We speculate that the mode of gene transfer is horizontal gene transfer such as plasmid, phage, or conjugative transposon.
In class Day 1:
Assignment:
In class Day 2:
Obtain a collection of “microbial” cells from “seawater”. The cells were concentrated from different depth intervals by a marine microbiologist travelling along the Line-P transect in the northeast subarctic Pacific Ocean off the coast of Vancouver Island British Columbia.
Sort out and identify different microbial “species” based on shared properties or traits. Record your data in this Rmarkdown using the example data as a guide.
Once you have defined your binning criteria, separate the cells using the sampling bags provided. These operational taxonomic units (OTUs) will be considered separate “species”. This problem set is based on content available at What is Biodiversity.
Load in the packages you will use.
#To make tables
library(kableExtra)
library(knitr)
#To manipulate and plot data
library(tidyverse)
Then load in the data.
example_data1= data.frame(
number = c(1:16),
name = c("gummy", "sour gummy", "rod", "L gummy yellow", "gummy swirls" , "gummy spider", "gummy cokes", "gummy lines", "balls", "gummy fruit", "brick", "skittle", "mm", "twizzler", "kisses", "mutated"),
characteristics = c("bear", "bear", "rods", "large rods", "circle swirl", "spider", "cokes", "lines", "balls", "fruit", "brick", "round", "round", "twizzlers", "chocolate", "red mutated rods"),
occurences = c(102, 3, 173, 2, 3, 6, 3, 7, 24, 2, 18, 192, 221, 14, 16, 2)
)
Finally, use these data to create a table.
example_data1 %>%
kable("html") %>%
kable_styling(bootstrap_options = "striped", font_size = 10, full_width = F)
| number | name | characteristics | occurences |
|---|---|---|---|
| 1 | gummy | bear | 102 |
| 2 | sour gummy | bear | 3 |
| 3 | rod | rods | 173 |
| 4 | L gummy yellow | large rods | 2 |
| 5 | gummy swirls | circle swirl | 3 |
| 6 | gummy spider | spider | 6 |
| 7 | gummy cokes | cokes | 3 |
| 8 | gummy lines | lines | 7 |
| 9 | balls | balls | 24 |
| 10 | gummy fruit | fruit | 2 |
| 11 | brick | brick | 18 |
| 12 | skittle | round | 192 |
| 13 | mm | round | 221 |
| 14 | twizzler | twizzlers | 14 |
| 15 | kisses | chocolate | 16 |
| 16 | mutated | red mutated rods | 2 |
For your community:
To help answer the questions raised in Part 1, you will conduct a simple but informative analysis that is a standard practice in biodiversity surveys. This analysis involves constructing a collector’s curve that plots the cumulative number of species observed along the y-axis and the cumulative number of individuals classified along the x-axis. This curve is an increasing function with a slope that will decrease as more individuals are classified and as fewer species remain to be identified. If sampling stops while the curve is still rapidly increasing then this indicates that sampling is incomplete and many species remain undetected. Alternatively, if the slope of the curve reaches zero (flattens out), sampling is likely more than adequate.
To construct the curve for your samples, choose a cell within the collection at random. This will be your first data point, such that X = 1 and Y = 1. Next, move consistently in any direction to a new cell and record whether it is different from the first. In this step X = 2, but Y may remain 1 or change to 2 if the individual represents a new species. Repeat this process until you have proceeded through all cells in your collection.
Load in these data.
example_data2 = data.frame(
x = c(1:126),
y = c(1,2,3,4,4,4,4,4,5,6,6,6,6,7,7,7,8,9,9,9,9,9,9,10,10,10,10,10,10,10,10,10,11,11,11,12,13,14,14,14,14,14,14,14,14,15,15,15,15,15,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16)
)
And then create a plot. We will use a scatterplot (geom_point) to plot the raw data and then add a smoother to see the overall trend of the data.
ggplot(example_data2, aes(x=x, y=y)) +
geom_point() +
geom_smooth() +
labs(x="Cumulative number of individuals classified", y="Cumulative number of species observed")
## `geom_smooth()` using method = 'loess'
## `geom_smooth()` using method = 'loess'
For your sample:
Using the table from Part 1, calculate species diversity using the following indices or metrics.
\(\frac{1}{D}\) where \(D = \sum p_i^2\)
\(p_i\) = the fractional abundance of the \(i^{th}\) species
For example, using the example data 1 with 3 species with 2, 4, and 1 individuals each, D =
species1 = 2/(2+4+1)
species2 = 4/(2+4+1)
species3 = 1/(2+4+1)
1 / (species1^2 + species2^2 + species3^2)
## [1] 2.333333
The higher the value is, the greater the diversity. The maximum value is the number of species in the sample, which occurs when all species contain an equal number of individuals. Because the index reflects the number of species present (richness) and the relative proportions of each species with a community (evenness), this metric is a diveristy metric. Consider that a community can have the same number of species (equal richness) but manifest a skewed distribution in the proportion of each species (unequal evenness), which would result in different diveristy values.
Another way to calculate diversity is to estimate the number of species that are present in a sample based on the empirical data to give an upper boundary of the richness of a sample. Here, we use the Chao1 richness estimator.
\(S_{chao1} = S_{obs} + \frac{a^2}{2b})\)
\(S_{obs}\) = total number of species observed a = species observed once b = species observed twice or more
So for our previous example community of 3 species with 2, 4, and 1 individuals each, \(S_{chao1}\) =
3 + 1^2/(2*2)
## [1] 3.25
We’ve been doing the above calculations by hand, which is a very good exercise to aid in understanding the math behind these estimates. Not surprisingly, these same calculations can be done with R functions. Since we just have a species table, we will use the vegan package. You will need to install this package if you have not done so previously.
library(vegan)
First, we must remove the unnecesary data columns and transpose the data so that vegan reads it as a species table with species as columns and rows as samples (of which you only have 1).
example_data1_diversity =
example_data1 %>%
select(name, occurences) %>%
spread(name, occurences)
example_data1_diversity
## balls brick gummy gummy cokes gummy fruit gummy lines gummy spider
## 1 24 18 102 3 2 7 6
## gummy swirls kisses L gummy yellow mm mutated rod skittle sour gummy
## 1 3 16 2 221 2 173 192 3
## twizzler
## 1 14
Then we can calculate the Simpson Reciprocal Index using the diversity function.
diversity(example_data1_diversity, index="invsimpson")
## [1] 4.869614
And we can calculate the Chao1 richness estimator (and others by default) with the the specpool function for extrapolated species richness. This function rounds to the nearest whole number so the value will be slightly different that what you’ve calculated above.
specpool(example_data1_diversity)
## Species chao chao.se jack1 jack1.se jack2 boot boot.se n
## All 16 16 0 16 0 16 16 0 1
In Project 1, you will also see functions for calculating alpha-diversity in the phyloseq package since we will be working with data in that form.
For your sample:
What are the Simpson Reciprocal Indices for your sample and community using the R function? Community=4.869614 Sample=6.45
What are the chao1 estimates for your sample and community using the R function? Community=16 sample=14.4
Values for both Simpson Reciprocal Indices and Chao1 estimates are the same using hand calculations and R for both community and sample.
+ Verify that these values match your previous calculations.
If you are stuck on some of these final questions, reading the Kunin et al. 2010 and Lundin et al. 2012 papers may provide helpful insights.
Callahan BJ, Mcmurdie PJ, Holmes SP. 2017. Exact sequence variants should replace operational taxonomic units in marker gene data analysis. The ISME Journal 11: 2639-2643. PMC5702726
Gaudet AD, Ramer LM, Nakonechny J, Cragg JJ, Ramer MS. 2010. Small-Group Learning in an Upper-Level University Biology Class Enhances Academic Performance and Student Attitudes Toward Group Work. PLoS ONE 5: 1-10. (http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0015821)
Hallam SJ, Torres-Beltrán M, Hawley AK. 2017. Monitoring microbial responses to ocean deoxygenation in a model oxygen minimum zone. Scientific Data 4:170158. PMC5663219
Hawley AK, Torres-Beltrán M, Zaikova E, Walsh DA, Mueller A, Scofield M, Kheirandish S, Payne C, Pakhomova L, Bhatia M, Shevchuk O, Gies EA, Fairley D, Malfatti SA, Norbeck AD, Brewer HM, Pasa-Tolic L, Rio TGD, Suttle CA, Tringe S, Hallam SJ. 2017. A compendium of multi-omic sequence information from the Saanich Inlet water column. Scientific Data 4:170160. PMC5663217
Kunin V, Engelbrektson A, Ochman H, Hugenholtz P. 2010. Wrinkles in the rare biosphere: pyrosequencing errors can lead to artificial inflation of diversity estimates. Environmental Microbiology 12:118–123. PMID19725865
Cordero OX, Ventouras L-A, Delong EF, Polz MF. 2012. Public good dynamics drive evolution of iron acquisition strategies in natural bacterioplankton populations. Proceedings of the National Academy of Sciences 109:20059–20064. PMC3523850
Giovannoni SJ. 2012. Vitamins in the sea. Proceedings of the National Academy of Sciences 109:13888–18889. PMC3435215
Lundin D, Severin I, Logue JB, Östman Ö, Andersson AF, Lindström ES. 2012. Which sequencing depth is sufficient to describe patterns in bacterial α- and β-diversity? Environmental Microbiology Reports 4:367–372. (https://onlinelibrary.wiley.com/doi/abs/10.1111/j.1758-2229.2012.00345.x)
Morris JJ, Lenski RE, Zinser ER. 2012. The Black Queen Hypothesis: Evolution of Dependencies through Adaptive Gene Loss. mBio 3:e00036-12. PMC3315703
Thompson JR, Pacocha S, Pharino C, Klepac-Ceraj V, Hunt DE, Benoit J, Sarma-Rupavtarm R, Distel DL, Polz MF. 2005. Genotypic Diversity Within a Natural Coastal Bacterioplankton Population. Science 307:1311–1313. PMID15731455
Sogin ML, Morrison HG, Huber JA, Welch DM, Huse SM, Nael PR, Arrieta JM, Herndl GJ. 2006. Microbial diversity in the deep sea and the underexplored ‘‘rare biosphere.’’ Proceedings of the National Academy of Sciences 103:12115–12120. PMC1524930
Torres-Beltrán M, Hawley AK, Capelle D, Zaikova E, Walsh DA, Mueller A, Scofield M, Payne C, Pakhomova L, Kheirandish S, Finke J, Bhatia M, Shevchuk O, Gies EA, Fairley D, Michiels C, Suttle CA, Whitney F, Crowe SA, Tortell PD, Hallam SJ. 2017. A compendium of geochemical information from the Saanich Inlet water column. Scientific Data 4:170159. PMC5663218
Welch RA, Burland V, Plunkett G, Redford P, Roesch P, Rasko D, Buckles EL, Liou SR, Boutin A, Hackett J, Stroud D, Mayhew GF, Rose DJ, Zhou S, Schwartz DC, Perna NT, Mobley HLT, Donnenberg MS, Blattner FR. 2002. Extensive mosaic structure revealed by the complete genome sequence of uropathogenic Escherichia coli. Proceedings of the National Academy of Sciences 99:17020–17024. PMC139262
title: “Project 1 report” author: “Group 6” date: “version April 06, 2018” output: html_document: df_print: paged toc: yes toc_float: collapsed: no —
Analysis of sequences obtained from Saanich Inlet using mothur and QIIME2 revealed peak community alpha-diversity at a depth of approximately 100 m which contains approximately 38 uM of dissolved oxygen. Lowest diversity was observed at greater depth and with lower oxygen levels. Further analysis of taxonomic levels revealed Proteobacteria as the most abundant phylum within all samples. In order to investigate how microbial communities differ across depth and oxygen gradients within the Saanich Inlet, we focused on the phylum Chloroflexi. Analysis revealed a positive correlation between depth and Chloroflexi abundance. This correlation was significant only when analysis was based on QIIME2-generated ASVs, but not mothur-generated OTUs. In addition, there exists a negative correlation between oxygen concentration and Chloroflexi abundance. Similarly, significance was only reported using QIIME2-generated ASVs. The analysis of data using QIIME2 identified four classes within the phylum Chloroflexi: Dehalococcoidia, Anaerolineae, SAR202, and JG30-KF-CM66, while mothur identified two classes within this phylum: Anaerolineae, and SAR202. Lastly, changes in the abundance of OTUs and ASVs were correlated with depth and oxygen concentration. No correlation was found to be significant. The abundances of 24 (of 34) OTUs and 38 (of 47) ASVs were positively correlated with depth, while all OTUs and 46 ASVs were negatively correlated with oxygen concentration. However, the presence of seeming outliers in data analyzed by both mothur and QIIME2 may have biased the trend line and reduced model significances. Although, the absence of sufficient data limits us from classifying them as outliers. The differences observed between mothur and QIIME2 indicate the role played by the choice of pipeline to analyze results.
Saanich Inlet is a seasonally anoxic fjord [1] located between Vancouver Island and the Saanich Peninsula. It is 24 km long and has a basin of up to 234 meters in depth [2]. It has a 75-meter sill which acts to protect the deeper waters [3]. Because of this sill and the constantly high input of organic material from freshwater discharge and primary production in surface waters, its conditions below 110 meters are anoxic [3]. Oxygen is replenished dependent on the season, mostly in the fall, which modifies the oxygen gradient and thereby the environmental conditions for the microbial community that inhabit the inlet [3]. Dissolved oxygen increases gradually from a minimum concentration at higher depth up to its peak concentration at the surface due to phytoplankton metabolism and atmospheric surface waters gas exchange [3]. Nitrate reduction by denitrifiers happens mostly in the deep water following oxygenation [3]. This results in a steep nitrate gradient when looking at the different depths within the fjord [3]. A study by Zaikova et al. found that microbial diversity was highest in the hypoxic transition area and that it decreases within the anoxic basin waters [1]. It is vital to study the roles of various microorganisms within Saanich Inlet in order to understand how they affect environmental conditions like greenhouse gases, methane, and denitrogenation on a larger scale in the world’s oceans [3].
Operational Taxonomic Units (OTUs) are defined as clusters of organisms that have been grouped based on DNA sequence similarity of a specific DNA segment known as a taxonomic marker gene [4]. The grouped DNA sequences differ by less than a fixed and arbitrary sequence dissimilarity threshold, often 3% [5]. This process of clustering on a specific DNA segment, known as DNA barcoding, allows for rapid, targeted, and high throughput analysis of genetic variation in a specific genomic region such as 16s/18s rRNA sequences, leading to large scale characterization of microbial communities [4, 6]. However, new recent amplicon sequence variants (ASVs) methods have been developed with finer resolution and are independent of dissimilarity thresholds that have been used to define OTUs. ASV methods have shown higher specificity and sensitivity in comparison to OTU methods as they distinguish sequence variants as small as single nucleotides and denoise the sequences by discriminating biological sequences from errors. This is done based on the expectation that biological sequences are more abundant and more repeatedly observed than error-containing sequences [5].
Using OTU and ASV data for samples collected from the Saanich Inlet, we investigated how microbial communities differ across depth and oxygen gradients within the Saanich Inlet, with a particular focus on the phylum Chloroflexi. We found Chloroflexi of interest because its members are highly abundant in marine sediments [7] and present a broad spectrum of metabolic characteristics such as anoxygenic photosynthesis [8], obligate aerobic and anaerobic heterotrophy [9], and even predation with a gliding motility [10]. Like many other microbes, members of Chloroflexi can be a challenge to grow in culture, with some classes yet to be cultured successfully, which has made characterizing their metabolisms a challenge [11, 19]. However, new sequencing technologies have made it possible to analyze the genome of and characterize these uncultured microbes [11-19].
In our Saanich Inlet data, we were able to identify four classes within the phylum Chloroflexi: Dehalococcoidia, Anaerolineae, SAR202, and JG30-KF-CM66. Members of the class Dehalococcoidia are widely distributed throughout marine sediments [11] and anoxic deep waters [12]. Dehalococcoidia grow via anaerobic organohalide respiration and are extensively studied for their potential in the bioremediation of chloride-contaminated water and soil [11, 12]. As for the class Anaerolineae, despite its members being prevalent in various ecosystems, only a few strains have been successfully cultured [13]. Anaerolineae compose one of the core populations of anaerobic bacteria involved in anaerobic digestion and possess key genes for catalyzing cellulose hydrolysis [14].The SAR202 cluster was one of the earliest discoveries of marine bacteria which inhabited the aphotic zone [15], and since then SAR202 has been found to be ubiquitous throughout the deep ocean [16]. Members of SAR202 are involved in metabolizing organosulfur compounds and likely play a major role in sulfur cycling [17]. JG30-KF-CM66 is a relatively uncharacterized clade of acidobacteria, but it has been identified in soft coal slags [18] and anoxic ocean water [19]. The characteristics of each of these classes impact how the spatial distribution of the Chloroflexi phylum differs within Saanich Inlet, and we also set out to determine if and how different sequence analysis pipelines would impact these biological conclusions.
Water samples from 16 depths (10-200m) from cruise 72 were collected at station S3 (48°35.500 N, 123°30.300 W) onboard MSV John Strickland. Geochemical and multi-omic information, which included 16S rRNA gene amplicon sequences (V4-5 hypervariable regions) and dissolved O2, were extracted for each depth [20, 21]. Data from 7 depths (10, 100, 120, 135, 150, 165, 200m) were further analyzed. Dissolved O2 was measured onboard by the Sea-Bird SBE 43 Photosynthetically Active Radiation sensor [20]. 2L of water sample at each depth were filtered onto 0.22μm Strerivix filters, and stored until amplicon sequencing, which was carried out on the Illumina MiSeq platform at the Joint Genome Institute. Base qualities were encoded in Phred33, and primers 515F and 806R were used for 16S rRNA gene amplification [21].
Reads were independently processed using mothur and QIIME2 based pipelines, which cluster sequences based on OTUs and ASVs, respectively. Resultant data were constructed as phyloseq objects for downstream analysis in R.
Sequenced reads were first assembled into contigs, which were screened and de-duplicated so that the remainder 1) were between 20bp and 600bp long, 2) had fewer than 8 homopolymers and 3) had no ambiguous bases.
make.file(inputdir=[filePath]/Saanich, prefix=Saanich)
make.contigs(file=Saanich.files, processors=10)
summary.seqs(fasta=Saanich.trim.contigs.fasta)
screen.seqs(fasta=Saanich.trim.contigs.fasta, group=Saanich.contigs.groups, maxambig=0, maxhomop=8, minlength=200, maxlength=600)
unique.seqs(fasta=Saanich.trim.contigs.good.fasta)
summary.seqs(fasta=Saanich.trim.contigs.good.unique.fasta, count=Saanich.trim.contigs.good.count_table)
Configs were trimmed so that they would only align to bases 10368 to 25434 in the SILVA database (release 128), and uninformative bases were removed. Resultant sequences were de-duplicated again.
align.seqs(fasta=Saanich.trim.contigs.good.unique.fasta, reference=silva.nr_v128.align, flip=T, processors=10)
summary.seqs(fasta=Saanich.trim.contigs.good.unique.align, count=Saanich.trim.contigs.good.count_table)
screen.seqs(fasta=Saanich.trim.contigs.good.unique.align, count=Saanich.trim.contigs.good.count_table, summary=Saanich.trim.contigs.good.unique.summary, start=10368, end=25434, processors=10)
filter.seqs(fasta=Saanich.trim.contigs.good.unique.good.align, vertical=T, trump=.)
unique.seqs(fasta=Saanich.trim.contigs.good.unique.good.filter.fasta, count=Saanich.trim.contigs.good.good.count_table)
Sequences were then pre-clustered and clustered de novo (using 97% sequence similarity) to determine the final OTUs. Chimeric sequences were also filtered away.
pre.cluster(fasta=Saanich.trim.contigs.good.unique.good.filter.unique.fasta, count=Saanich.trim.contigs.good.unique.good.filter.count_table, diffs=3)
summary.seqs(fasta=Saanich.trim.contigs.good.unique.good.filter.unique.precluster.fasta, count=Saanich.trim.contigs.good.unique.good.filter.unique.precluster.count_table)
chimera.uchime(fasta=Saanich.trim.contigs.good.unique.good.filter.unique.precluster.fasta, count=Saanich.trim.contigs.good.unique.good.filter.unique.precluster.count_table, dereplicate=t)
remove.seqs(fasta=Saanich.trim.contigs.good.unique.good.filter.unique.precluster.fasta, count=Saanich.trim.contigs.good.unique.good.filter.unique.precluster.count_table, accnos=Saanich.trim.contigs.good.unique.good.filter.unique.precluster.denovo.uchime.accnos)
dist.seqs(fasta=Saanich.final.fasta, processors=15)
cluster.split(column=Saanich.final.dist, count=Saanich.final.count_table, method=opti, processors=10, large=T)
make.shared(list=Saanich.final.opti_mcc.unique_list.list, count=Saanich.final.count_table, label=0.03)
Clusters were classified using the SILVA database, and resulting taxonomies were condensed.
classify.seqs(fasta=Saanich.final.fasta, count=Saanich.final.count_table, template=silva.nr_v128.align, taxonomy=silva.nr_v128.tax, cutoff=80, processors=10)
classify.otu(list=Saanich.final.opti_mcc.unique_list.list, taxonomy=Saanich.final.nr_v128.wang.taxonomy, count=Saanich.final.count_table, label=0.03, cutoff=80, basis=otu, probs=F)
Reads were demultiplexed and imported into QIIME2. Per-base read qualities were visualized to determine downstream trim parameters.
qiime tools import \
--type 'SampleData[PairedEndSequencesWithQuality]' \
--input-path file/path/here/pe-33-manifest.csv \
--output-path paired-end-demux.qza \
--source-format PairedEndFastqManifestPhred33
qiime demux summarize \
--i-data paired-end-demux.qza \
--o-visualization demux.qzv
ASVs were generated using the Dada2 protocol using custom trim parameters for quality control. Resultant ASVs were classified using the SILVA database (release 119) using a 99% similarity threshold.
qiime dada2 denoise-paired \
--i-demultiplexed-seqs paired-end-demux.qza \
--o-table table \
--o-representative-sequences rep-seqs \
--p-trim-left-f 5 \
--p-trim-left-r 12 \
--p-trunc-len-f 240 \
--p-trunc-len-r 184
qiime feature-classifier classify-sklearn \
--i-classifier silva-119-99-515-806-nb-classifier.qza \
--i-reads rep-seqs.qza \
--o-classification taxonomy.qza
The ASV table was then converted to text format used to create a phyloseq object.
qiime tools export table.qza \
--output-dir exported-feature-table
biom convert -i exported-feature-table/feature-table.biom -o feature-table.tsv \
--to-tsv
Analysis was completed in R v3.4.3 [5] using the following packages.
library(tidyverse)
library(phyloseq)
library(ggplot2)
library(dplyr)
library(stringr)
library(magrittr)
library(knitr)
library(gridExtra)
library(grid)
library(randomcoloR)
Alpha-diversities of clusters identified by mothur and QIIME2 from each sample were measured by the Shannon diversity index and the Chao1 richness estimator. Alpha-diversities were plotted against sample depth and oxygen concentration for both clustering methods, and were fitted using local polynomial regression models where appropriate. Relative abundances of all phylum level classifications produced by mothur and QIIME2 were also plotted for each sample.
Relative abundances of Chloroflexi OTUs and ASVs amongst all clusters were plotted, both as a whole and individually, across depth and oxygen gradients. Significances of correlations between these variables were based on linear regression models, as all variables are continuous and there is a lack of evidence to suggest curvilinear relationships between them.
# generate random colours for use in figures.
palette <- distinctColorPalette(40)
Data were loaded into R and samples normalized to 100,000 sequences per sample.
load("mothur_phyloseq.RData")
load("qiime2_phyloseq.RData")
# random seed set for reproducibility
set.seed(4831)
m.norm = rarefy_even_depth(mothur, sample.size=100000)
q.norm = rarefy_even_depth(qiime2, sample.size=100000)
Relative abundance percentages were calculated for the data.
m.percent = transform_sample_counts(m.norm, function(x) 100 * x/sum(x))
q.percent = transform_sample_counts(q.norm, function(x) 100 * x/sum(x))
The phylum Chloroflexi was chosen.
phylum_name_mothur = "Chloroflexi"
phylum_name_qiime2 = "D_1__Chloroflexi"
Shannon diversity index and Chao1 were calculated for the total microbial community across depth and oxygen concentration gradients for both mothur and QIIME2.
# Alpha-diversity of total community for mothur
m.alpha = estimate_richness(m.norm, measures = c("Chao1", "Shannon"))
m.meta.alpha = full_join(rownames_to_column(m.alpha),
rownames_to_column(data.frame(m.percent@sam_data)), by = "rowname")
m.shannon.depth.plot <- m.meta.alpha %>%
ggplot() +
geom_point(aes(x=Depth_m, y=Shannon)) +
geom_smooth(method='auto', aes(x=as.numeric(Depth_m), y=Shannon)) +
labs(title="Mothur", y="Shannon diversity index", x=NULL)
m.chao1.depth.plot <- m.meta.alpha %>%
ggplot() +
geom_point(aes(x=Depth_m, y=Chao1)) +
geom_smooth(method='auto', aes(x=as.numeric(Depth_m), y=Chao1)) +
labs(title="Mothur", y="Chao1 richness estimator", x="Depth (m)")
m.shannon.o2.plot <- m.meta.alpha %>%
ggplot() +
geom_point(aes(x=O2_uM, y=Shannon)) +
# geom_smooth(method='auto', aes(x=as.numeric(O2_uM), y=Shannon)) +
labs(title="Mothur", y="Shannon diversity index", x=NULL)
m.chao1.o2.plot <- m.meta.alpha %>%
ggplot() +
geom_point(aes(x=O2_uM, y=Chao1)) +
# geom_smooth(method='auto', aes(x=as.numeric(O2_uM), y=Chao1)) +
labs(title="Mothur", y="Chao1 richness estimator", x="Oxygen (uM)")
# Alpha-diversity of total community for QIIME2
q.alpha = estimate_richness(q.norm, measures = c("Chao1", "Shannon"))
q.meta.alpha = full_join(rownames_to_column(q.alpha),
rownames_to_column(data.frame(q.percent@sam_data)), by = "rowname")
q.shannon.depth.plot <- q.meta.alpha %>%
ggplot() +
geom_point(aes(x=Depth_m, y=Shannon)) +
geom_smooth(method='auto', aes(x=as.numeric(Depth_m), y=Shannon)) +
labs(title="Qiime2", y=NULL, x=NULL)
q.chao1.depth.plot <- q.meta.alpha %>%
ggplot() +
geom_point(aes(x=Depth_m, y=Chao1)) +
geom_smooth(method='auto', aes(x=as.numeric(Depth_m), y=Chao1)) +
labs(title="Qiime2", y=NULL, x="Depth (m)")
q.shannon.o2.plot <- q.meta.alpha %>%
ggplot() +
geom_point(aes(x=O2_uM, y=Shannon)) +
# geom_smooth(method='auto', aes(x=as.numeric(O2_uM), y=Shannon)) +
labs(title="Qiime2", y=NULL, x=NULL)
q.chao1.o2.plot <- q.meta.alpha %>%
ggplot() +
geom_point(aes(x=O2_uM, y=Chao1)) +
# geom_smooth(method='auto', aes(x=as.numeric(O2_uM), y=Chao1)) +
labs(title="Qiime2", y=NULL, x="Oxygen (uM)")
# Plotting depth graph
grid.arrange(m.shannon.depth.plot, q.shannon.depth.plot, m.chao1.depth.plot, q.chao1.depth.plot, ncol=2, top=textGrob("Figure 1 Alpha-diversity across Depth",gp=gpar(fontsize=16,font=3)))
The same patterns of alpha-diversity (Shannon diversity index and the Chao1 richness estimator) can be observed across depth for both mothur and QIIME2 (Fig. 1). There is a slightly lower diversity in surface waters (0m) compared to 100m depth. Peak diversity is reached at ~100-120m then diversity decreases with greater depth, with a slight increase at 200m for all but Shannon diversity index for QIIME2.
Note, however, that despite the similarity in the alpha-diversity pattern, the comparison of mothur versus QIIME2 shows difference: across all depths, mothur OTU analysis resulted in a lower alpha-diversity than the QIIME2 ASV analysis when measured with the Shannon diversity index and a higher alpha-diversity than the QIIME2 ASV analysis when measured with Chao1.
# Plotting oxygen graph
grid.arrange(m.shannon.o2.plot, q.shannon.o2.plot, m.chao1.o2.plot, q.chao1.o2.plot, ncol=2, top=textGrob("Figure 2 Alpha-diversity across Oxygen Concentration",gp=gpar(fontsize=16,font=3)))
Looking at Shannon diversity across oxygen concentration (Fig. 2), we find that at equivalent depths QIIME2 has a greater diversity than mothur. However, the pattern exhibited by both mothur and QIIME2 data is still similar. The three lowest diversity points (note for mothur: 2 points at 2.35 overlap) are at an oxygen concentration of 0 uM, while the highest diversity is found at an oxygen concentration of ~38 uM. The band of 95% confidence intervals was not plotted due to the lack of data between ~38 uM and ~217 uM of oxygen.
Comparing Chao1 at different oxygen levels for mothur and QIIME2 shows that the patterns somewhat differ. While the three lowest diversity points are still at 0 uM of oxygen, for mothur the highest diversity in terms of Chao1 is at an oxygen concentration of ~38 uM, while for QIIME2 it is at an oxygen concentration of ~32 uM. For both, oxygen concentration of ~217 uM shows a notable decrease in diversity. Chao1 exhibited a relatively greater drop at ~217 uM of oxygen compared to Shannon.
# Mothur
m.phyla.plot = m.percent %>%
plot_bar(fill="Phylum")+
geom_bar(aes(fill=Phylum), stat="identity")+
labs(title="Figure 3 Phyla across Samples for Mothur", y="Abundance (%)")+
scale_fill_manual(values=palette)
# QIIME2
q.phyla.plot = q.percent %>%
plot_bar(fill="Phylum")+
geom_bar(aes(fill=Phylum), stat="identity")+
labs(title="Figure 4 Phyla across Samples for QIIME2", y="Abundance (%)")+
scale_fill_manual(values=palette)
28 and 29 taxons were identified at the phylum level with mothur and QIIME2, respectively (Fig. 3,4). Out of these identified phyla in both mothur and QIIME2, ~4 dominated the community composition in terms of abundance: Proteobacteria, Bacteroidetes, Thaumarchaeota and Actinobacteria (from most to less abundant). Other phyla that are noticeably more abundant include Cyanobacteria, Deferribacteres, Euryarchaeota, Firmiucutes, Gemmatimonadetes, Marinimicrobia, Nitrospinae, Planctomycetes and Verrucomicrobia. Our phylum of interest, Chloroflexi, makes up from 0 to 6% of the microbial community in the collected samples depending on depth. A more specific naming system seems to be used by mothur than QIIME2, which results in a more descriptive labelling of the population composition in the former.
# Significance across depth
m.chlor.lm = m.norm %>%
subset_taxa(Phylum==phylum_name_mothur) %>%
tax_glom(taxrank = 'Phylum') %>%
psmelt() %>%
lm(Abundance ~ Depth_m, .) %>%
summary()
q.chlor.lm = q.norm %>%
subset_taxa(Phylum==phylum_name_qiime2) %>%
tax_glom(taxrank = 'Phylum') %>%
psmelt() %>%
lm(Abundance ~ Depth_m, .) %>%
summary()
taxon.abundance = data.frame("Estimate" = numeric(0), "Std. Error"= numeric(0),"t value"= numeric(0),"Pr(>|t|)"= numeric(0))
taxon.abundance <- rbind(taxon.abundance, m.chlor.lm$coefficients["Depth_m",])
taxon.abundance <- rbind(taxon.abundance, q.chlor.lm$coefficients["Depth_m",])
rownames(taxon.abundance) <- (c("mothur", "QIIME2"))
colnames(taxon.abundance) <- (c("Estimate", "Std. Error","t value","Pr(>|t|) (p-value)"))
kable(taxon.abundance,caption="Table 1 Correlation Data of Chloroflexi Phylum across Depth")
| Estimate | Std. Error | t value | Pr(>|t|) (p-value) | |
|---|---|---|---|---|
| mothur | 1.327529 | 0.6389862 | 2.077554 | 0.0923485 |
| QIIME2 | 2.622128 | 0.4212043 | 6.225311 | 0.0015644 |
m.abd.depth.plot <- m.percent %>%
subset_taxa(Phylum==phylum_name_mothur) %>%
psmelt() %>%
group_by(Sample) %>%
summarize(Abundance_sum=sum(Abundance), Depth_m=mean(Depth_m)) %>%
ggplot() +
geom_point(aes(x=Depth_m, y=Abundance_sum)) +
geom_smooth(method='lm', aes(x=as.numeric(Depth_m), y=Abundance_sum)) +
labs(title="Mothur", y="Abundance (%)", x="Depth (m)")
q.abd.depth.plot <- q.percent %>%
subset_taxa(Phylum==phylum_name_qiime2) %>%
psmelt() %>%
group_by(Sample) %>%
summarize(Abundance_sum=sum(Abundance), Depth_m=mean(Depth_m)) %>%
ggplot() +
geom_point(aes(x=Depth_m, y=Abundance_sum)) +
geom_smooth(method='lm', aes(x=as.numeric(Depth_m), y=Abundance_sum)) +
labs(title="QIIME2", y=NULL, x="Depth (m)")
# Significance across oxygen concentrations
m.chlor.lm.ox = m.norm %>%
subset_taxa(Phylum==phylum_name_mothur) %>%
tax_glom(taxrank = 'Phylum') %>%
psmelt() %>%
lm(Abundance ~ O2_uM, .) %>%
summary()
q.chlor.lm.ox = q.norm %>%
subset_taxa(Phylum==phylum_name_qiime2) %>%
tax_glom(taxrank = 'Phylum') %>%
psmelt() %>%
lm(Abundance ~ O2_uM, .) %>%
summary()
taxon.abundance.ox = data.frame("Estimate" = numeric(0), "Std. Error"= numeric(0),"t value"= numeric(0),"Pr(>|t|)"= numeric(0))
taxon.abundance.ox <- rbind(taxon.abundance.ox, m.chlor.lm.ox$coefficients["O2_uM",])
taxon.abundance.ox <- rbind(taxon.abundance.ox, q.chlor.lm.ox$coefficients["O2_uM",])
rownames(taxon.abundance.ox) <- (c("mothur", "QIIME2"))
colnames(taxon.abundance.ox) <- (c("Estimate", "Std. Error","t value","Pr(>|t|) (p-value)"))
kable(taxon.abundance.ox,caption="Table 2 Correlation Data of Chloroflexi Phylum across Oxygen Concentration")
| Estimate | Std. Error | t value | Pr(>|t|) (p-value) | |
|---|---|---|---|---|
| mothur | -0.750471 | 0.5865861 | -1.279387 | 0.2569128 |
| QIIME2 | -1.731996 | 0.5762708 | -3.005525 | 0.0299088 |
m.abd.o2.plot <- m.percent %>%
subset_taxa(Phylum==phylum_name_mothur) %>%
psmelt() %>%
group_by(Sample) %>%
summarize(Abundance_sum=sum(Abundance), O2_uM=mean(O2_uM)) %>%
ggplot() +
geom_point(aes(x=O2_uM, y=Abundance_sum)) +
geom_smooth(method='lm', aes(x=as.numeric(O2_uM), y=Abundance_sum)) +
labs(title="Mothur", y="Abundance (%)", x="O2 (uM)")
q.abd.o2.plot <- q.percent %>%
subset_taxa(Phylum==phylum_name_qiime2) %>%
psmelt() %>%
group_by(Sample) %>%
summarize(Abundance_sum=sum(Abundance), O2_uM=mean(O2_uM)) %>%
ggplot() +
geom_point(aes(x=O2_uM, y=Abundance_sum)) +
geom_smooth(method='lm', aes(x=as.numeric(O2_uM), y=Abundance_sum)) +
labs(title="QIIME2", y=NULL, x="O2 (uM)")
# Plotting depth graph
grid.arrange(m.abd.depth.plot, q.abd.depth.plot, ncol=2, top=textGrob("Figure 5 Chloroflexi Abundance across Depth",gp=gpar(fontsize=16,font=3)))
# Plotting oxygen graph
grid.arrange(m.abd.o2.plot, q.abd.o2.plot, ncol=2, top=textGrob("Figure 6 Chloroflexi Abundance across Oxygen Concentration",gp=gpar(fontsize=16,font=3)))
Linear regression analysis of Chloroflexi relative abundance across depth revealed variations between mothur’s OTU and QIIME2’s ASV clustering (Fig. 5). Abundance of ASV clusters revealed a significant correlation with depth (p<0.05), while OTU clusters did not (Table 1). Both correlations were found to be positive.
Similarly, linear regression analysis of Chloroflexi relative abundance across oxygen concentration (Fig. 6) revealed a significant correlation of oxygen concentration with ASV clusters (p<0.05), but not with OTU clusters (Table 2). Both correlations were found to be negative.
# Number of OTUs
m.tax_table = data.frame(m.norm@tax_table)
m.filtered = m.tax_table %>%
rownames_to_column('OTU') %>%
filter(Phylum==phylum_name_mothur) %>%
column_to_rownames('OTU')
m.rownumber = nrow(m.filtered)
# Classes in OTUs
m.classes = m.filtered %>%
select('Class') %>%
unique %>%
summarise(Classes = toString(Class))
# Number of ASVs
q.tax_table = data.frame(q.norm@tax_table)
q.filtered = q.tax_table %>%
rownames_to_column('ASV') %>%
filter(Phylum==phylum_name_qiime2) %>%
column_to_rownames('ASV')
q.rownumber = nrow(q.filtered)
# Classes in ASVs
q.classes = q.filtered %>%
select('Class') %>%
unique %>%
summarise(Classes = toString(Class))
For Chloroflexi, the number of OTUs was found to be 34, and the number of ASVs was found to be 47. The OTUs represent classes: SAR202_clade, Anaerolineae, while the ASVs represent classes: D_2__JG30-KF-CM66, D_2__Anaerolineae, D_2__uncultured, D_2__Dehalococcoidia, D_2__SAR202 clade
# Example for linear model
otu_stats = data.frame("Estimate" = numeric(0), "Std. Error"= numeric(0),"t value"= numeric(0),"Pr(>|t|)"= numeric(0))
for (otu in row.names(m.filtered)){
linear_fit = m.norm %>%
psmelt() %>%
filter(OTU==otu) %>%
lm(Abundance ~ Depth_m, .) %>%
summary()
otu_data = linear_fit$coefficients["Depth_m",]
otu_stats <- rbind(otu_stats, otu_data)
}
colnames(otu_stats)<- (c("Estimate", "Std. Error","t value","Pr(>|t|) (p-value)"))
row.names(otu_stats) <- row.names(m.filtered)
otu_stats = cbind(data.frame(Class = m.filtered$Class), Genus = m.filtered$Genus, otu_stats)
sorted = arrange(rownames_to_column(otu_stats),Estimate)%>% column_to_rownames(var="rowname")
lm.depth.otus = kable(sorted,caption="Table A1 Correlation data of Chloroflexi OTUs Abundance with Depth")
# Example for correlation graph
m.percent %>%
subset_taxa(Phylum==phylum_name_mothur) %>%
psmelt() %>%
ggplot() +
geom_point(aes(x=Depth_m, y=Abundance)) +
geom_smooth(method='lm', aes(x=Depth_m, y=Abundance)) +
facet_wrap(~OTU, scales="free_y") +
labs(title="Figure 7 Abundance of Chloroflexi OTUs across Depth") +
xlab("Depth (m)") +
ylab("Abundance (%)") +
theme(axis.text.x = element_text(angle = 90))
Linear model statistics were performed for the abundance of each OTU and ASV in relation to depth and oxygen concentration (Appendix A Table A1-A4). The linear models were subsequently plotted (Fig. 7-10). No significant correlations were found between any individual OTUs/ASVs abundance and depth or oxygen concentration (p > 0.05 for all).
Although none of the correlations were significant, mothur and QIIME2 showed similar trends. For mothur ten of the 34 OTUs had negative correlation between abundance and depth (the rest positive), while for QIIME2 nine of the 47 ASVs had negative correlation between abundance and depth (the rest positive). This was while for abundance versus oxygen concentration, for mothur all OTUs had negative correlation, and for QIIME2 all but one ASVs had negative correlation.
Linear model statistics were performed for the abundance of each OTU and ASV in relation to hydrogen sulfide concentration (Appendix A Table A5 & A6). The linear models were subsequently plotted (Fig. 11 & 12). Significant positive correlations between abundance and hydrogen sulfide concentration were found for 15 and 19 individual OTUs and ASVs, respectively. For OTUs and ASVs, p-values were <0.05. Classes associated with OTUs and ASVs positively correlating with hydrogen sulfide concentration include Anaerolineae, Dehalococcoidia and the SAR202 clade.
QIIME2 identified 9 more individual members of Chloroflexi that significantly and positively correlate with hydrogen sulfide concentration compared to mothur. In addition, the significance of the results and the positive correlation with hydrogen sulfide concentration were generally greater with QIIME2 than mothur.
The Saanich Inlet provides a good model for the study of oxygen minimum zones and the microbial dynamics that shape them. The microbial diversity analyses we carried out with the two bioinformatic pipelines mothur and QIIME2 resulted in mostly similar patterns, but there were differences when it came to the details. Both pipelines found that the peak Shannon diversity index was at 100 m, however, the peak Chao1 richness estimates for the total community was at a 100 m for mothur and 120 m for QIIME2. We see the same discrepancy between mothur and QIIME2 and their Chao1 richness estimates when looking at alpha-diversity across dissolved oxygen concentrations (Figure 2). As for discrepancies between Shannon diversity and Chao1 richness, we found that at the dissolved oxygen concentration of 217 μm Chao1 was relatively lower than Shannon for both mothur and QIIME2. The lower value of Chao1 compared to Shannon at ~217uM could indicate increased species evenness despite reduced species diversity because Chao1 does not take into account evenness while Shannon does. The two pipelines also agreed that Saanich Inlet is dominated by only 4-5 phyla, with the phylum Chloroflexi making up 0-6% of the microbial community depending on the sample depth.
The phylum Chloroflexi contains bacteria with different metabolic characteristics, such as aerobic thermophiles, anoxygenic phototrophs, and anaerobic halorespirers which use halogenated organics as electron acceptors [22]. In our analysis, we found that Chloroflexi abundance was positively correlated with depth (Table 1). This may be a result of oxygen concentrations decreasing with depth: a negative correlation between Chloroflexi abundance and oxygen concentrations was also found (Table 2). Interestingly, these results were determined to only be significant in the QIIME2 analysis, but not the mothur analysis. This discrepancy is discussed later and points out that the analysis method, in this case, plays a role in whether our analysis results are significant or not. The results indicate a preference for members of Chloroflexi to inhabit anoxic habitats at depth within Saanich Inlet, which is supported by previously mentioned research on this phylum [9, 14, 19]. Since Chloroflexi encompasses such vastly different classes of bacteria with disparate metabolistic behavior, it is important to note that the diverse classes within this phylum have different requirements for oxygen content, where some of them also thrive in oxic environments [9, 14, 19]. This may explain why the abundance vs. depth and oxygen concentration results with mothur were not significant (Table 1 & 2). However, we only identified a few classes within Chloroflex from our data, and each of those classes appears to be anaerobic [12-16], so the potential outliers might be a result of something else, such as other nutrients. Anoxic oceanic zones such as the deep waters of Saanich Inlet may provide an environment in which anaerobic members of Chloroflexi can be more competitive than other phyla and dominate the microbial population.
The richness within Chloroflexi highly depends on which bioinformatic pipeline is used for the analysis. Mothur was only able to identify 34 OTUs occupying 2 classes within Chloroflexi, while QIIME2 was able to identify 47 ASVs and 5 classes within Chloroflexi. However, QIIME2 was unable to identify any genera while mothur was (Table A1-3). Therefore, the required depth into the taxonomic tree may dictate which pipeline should be used in future analyses. Correlations between individual OTUs and ASVs within Chloroflexi against depth or oxygen concentration were found to be insignificant (Appendix A Table 1-4). Linear models in figure 7-10 show similar trends between the individual OTUs and ASV with depth and oxygen concentration, however there are glaring single outlier defying any correlation between the data points. These outliers are present in all analyses but do not occur at the same depths or oxygen levels for the various ASVs and OTUs studied. This may mean that there is no pattern or rationale for their occurrences. More data would allow to either legitimize these data points or confirm them as outliers to either integrate them or exclude them from the results.
Strong positive and significant correlations were found between various members of the Chloroflexi phylum and hydrogen sulfide concentrations. Classes correlating with hydrogen sulfide levels include Anaerolineae, Dehalococcoidia and the SAR202 clade (Figure 11 & 12, Appendix A Table 5 & 6). These results are supported by previous studies that have identified members of SAR202 cluster belonging to the Chloroflexi phylum to play a major role in the sulfur cycle in the dark water column. Some of these members have pathways for sulfur reduction and could be responsible for the hydrogen sulfide concentrations at depth. As previously stated, they also have the potential to metabolize a variety of organosulfur compounds [17]. In addition, single-cell genomics studies of members of the Dehalococcoidia class within the Chloroflexi phylum highlighted their association with marine sediments and sulfur cycling [11].
Differences in bioinformatic pipelines for microbial ecology data analysis may result in potential differences in analytical outcomes, and can lead to misidentifying species in different habitats or incorrectly determining trends. As we observed in this study, although mothur and QIIME2 often produced similar patterns, they did not agree on details. While QIIME2 led us to conclude that there was a presence of four classes (plus one uncultured class) of Chloroflexi in our samples, mothur only identified two. These differences are concerning, since depending on the pipeline we use, we might miss organisms we have collected, or we might identify organisms that are in reality not there. Consequently, we can be drawing the wrong conclusions about ecosystems, and the interplay of its inhabitants. Furthermore, whether we find significant correlations can also depend on the pipeline used. While Chloroflexi abundance was found to significantly correlate with depth and oxygen concentration for QIIME2, the correlation was not significant for mothur. In fact, the significances differed by an order of magnitude. This emphasizes the concern that significance in analysis findings may rest upon the usage of different pipelines and clustering paradigms, leading to false-positives and false-negatives. This also highlights the importance of developing objective metrics to gauge the accuracy of both clustering paradigms.
Differences we observed between the usage of mothur versus QIIME2 to analyse the reads in this study can also be seen in other research. For instance, a study examining the composition of chicken cecum microbiome performed by Allalil et al. revealed lower phylogenetic diversity (PD) values when UPARSE pipelines were used in comparison to de novo QIIME pipelines and open reference QIIME pipelines [23]. However, Species Richness (S) values were comparable when comparing different pipelines. In addition, the number of assigned sequences for different sequencing platform runs were impacted because of OTU picking using different pipelines: De novo vs. open reference QIIME pipelines. Furthermore, the QIIME pipelines generated different relative abundance of specific genera in comparison to UPARSE. Moreover, differences in the detection profiles, such as the number of unique species, were observed when using different pipelines. The number of OTUs and taxonomic assignments produced and identified differed between pipelines with a 99% similarity threshold.
Since multiple classes were identified under the order Chloroflexi and variations in the directions of correlation with depth were observed for several clusters, future analyses may find more meaning at a sub-phylum level. Additionally, more data were obtained than were analyzed in the current report. In the future, one could look for correlations of abundance across the other factors, such as temperature or salinity, not just depth and oxygen concentration. Furthermore, there is a gap in the data between the depth of 10m and depth of 100m, which makes it difficult to determine correlations. More consistent data collection with more samples and at more regular intervals could help alleviate such problems, and potentially show significant correlations. Analysis of data available from the collection of samples over time could be interesting in exploring how the diversity in the area changes over seasons, or over longer time periods, such as decades. It could also be of interest, for any unknown, or not very well known, organisms to look into more details of their genetic make up in order to determine what roles, if any, they might play in biogeochemical cycles.
[1] Zaikova E., Walsh DA, Stilwell CP, Mohn WW, Tortell PD, Hallam SJ. 2010. Microbial community dynamics in a seasonally anoxic fjord: Saanich Inlet, British Columbia. Environmental Microbiology 12:172-191.
[2] Herlinveaux RH. 2011. Journal of the Fisheries Research Board of Canada 19: 1-37.
[3] 2012. Saanich Inlet. MicrobeWiki.
[4] Blaxter M, Mann J, Chapman T, Thomas F, Whitton C, Floyd R, Abebe E. 2005. Defining operational taxonomic units using DNA barcode data. Philosophical Transactions of the Royal Society B: Biological Sciences 360: 1935–1943.
[5] Callahan BJ, Mcmurdie PJ, Holmes SP. 2017. Exact sequence variants should replace operational taxonomic units in marker gene data analysis. Multidisciplinary Journal of Microbial Ecology 11: 2639–2643.
[6] Schmidt TSB, Rodrigues JFM, Christian M. 2014. Ecological Consistency of SSU rRNA-Based Operational Taxonomic Units at a Global Scale. PLoS Comput Biol. 10.
[7] Wang, Y., Sheng, H., He, Y., Wu, J., Jiang, Y., Tam, N. F., & Zhou, H. 2012. Comparison of the levels of bacterial diversity in freshwater, intertidal wetland, and marine sediments by using millions of illumina tags. Applied and Environmental Microbiology 78: 8264-8271. 10.1128/AEM.01821-12
[8] Thiel V, Hamilton TL, Tomsho LP, Burhans R, Gay SE, Schuster SC, et al. 2014. Draft genome sequence of a sulfide-oxidizing, autotrophic filamentous anoxygenic phototrophic bacterium, Chloroflexus sp. strain MS-G (Chloroflexi). Genome Announc 2: 9–10.
[9] Sekiguchi Y, Yamada T, Hanada S, Ohashi A, Harada H, Kamagata Y. 2003. Anaerolinea thermophila gen. nov., sp. nov. and Caldilinea aerophila gen. nov., sp. nov., novel filamentous thermophiles that represent a previously uncultured lineage of the domain bacteria at the subphylum level. Int J Syst Evol Microbiol 53: 1843–51.
[10] Kiss H, Nett M, Domin N, Martin K, Maresca JA, Copeland A, Lapidus A, Lucas S, Berry KW, Rio TGD, Dalin E, Tice H, Pitluck S, Richardson P, Bruce D, Goodwin L, Han C, Detter JC, Schmutz J, Brettin T, Land M, Hauser L, Kyrpides NC, Ivanova N, Göker M, Woyke T, Klenk H-P, Bryant DA. 2011. Complete genome sequence of the filamentous gliding predatory bacterium Herpetosiphon aurantiacus type strain (114-95T). Standards in Genomic Sciences 5: 356–370.
[11] Wasmund, K., Cooper, M., Schreiber, L., Lloyd, K. G., Baker, B. J., Petersen, D. G., . . . Adrian, L. 2016. Single-cell genome and group-specific dsrAB sequencing implicate marine members of the class dehalococcoidia (phylum chloroflexi) in sulfur cycling. Mbio 7: e00266. 10.1128/mBio.00266-16
[12] Biderre-Petit C, Dugat-Bony E, Mege M, Parisot N, Adrian L, Moné A, Denonfoux J, Peyretaillade E, Debroas D, Boucher D, Peyret P. 2016. Distribution of Dehalococcoidia in the Anaerobic Deep Water of a Remote Meromictic Crater Lake and Detection of Dehalococcoidia-Derived Reductive Dehalogenase Homologous Genes. Plos One 11.
[13] Hugenholtz, P., Goebel, B. M., & Pace, N. R. 1998. Impact of culture-independent studies on the emerging phylogenetic view of bacterial diversity. Journal of Bacteriology 18: 4765-4774.
[14] Xia Y, Wang Y, Wang Y, Chin FYL, Zhang T. 2016. Cellular adhesiveness and cellulolytic capacity in Anaerolineae revealed by omics-based genome interpretation. Biotechnology for Biofuels 9.
[15] Giovannoni SJ, Rappe MS, Vergin KL, Adair NL. 1996. 16S rRNA genes reveal stratified open ocean bacterioplankton populations related to the Green Non-Sulfur bacteria. Proceedings of the National Academy of Sciences 93:7979–7984.
[16] Morris RM, Rappé MS, Urbach E, Connon SA, Rappe MS, Giovannoni SJ. 2004. Prevalence of the Chloroflexi-related SAR202 bacterioplankton cluster throughout the mesopelagic zone and deep ocean. Appl Environ Microbiol 70: 2836–42.
[17] Mehrshad M, Rodriguez-Valera F, Amoozegar MA, López-García P, Ghai R. 2017. The enigmatic SAR202 cluster up close: shedding light on a globally distributed dark ocean lineage involved in sulfur cycling. The ISME Journal 12: 655–668.
[18] Wegner C-E, Liesack W. 2017. Unexpected Dominance of Elusive Acidobacteria in Early Industrial Soft Coal Slags. Frontiers in Microbiology 8.
[19] Ye Q, Wu Y, Zhu Z, Wang X, Li Z, Zhang J. 2016. Bacterial diversity in the surface sediments of the hypoxic zone near the Changjiang Estuary and in the East China Sea. MicrobiologyOpen 5: 323–339.
[20] Torres-Beltrán M, Hawley AK, Capelle D, Zaikova E, Walsh DA, Mueller A, Scofield M, Payne C, Pakhomova L, Kheirandish S, Finke J, Bhatia M, Shevchuk O, Gies EA, Fairley D, Michiels C, Suttle CA, Whitney F, Crowe SA, Tortell PD, Hallam SJ. 2017. A compendium of geochemical information from the Saanich Inlet water column. Sci Data 4: 170159.
[21] Hawley AK, Torres-Beltrán M, Zaikova E, Walsh DA, Mueller A, Scofield M, Kheirandish S, Payne C, Pakhomova L, Bhatia M, Shevchuk O, Gies EA, Fairley D, Malfatti SA, Norbeck AD, Brewer HM, Pasa-Tolic L, del Rio TG, Suttle CA, Tringe S, Hallam SJ. 2017. A compendium of multi-omic sequence information from the Saanich Inlet water column. Sci Data 4: 170160.
[22] “Chloroflexi (phylum)” on Revolvy.com. Trivia Quizzes.
[23] Allali I, Arnold J.W., Roach J, Cadenas M.B., Butz N, Hassan H.M., Koci M, Ballou A, Mendoza M, Ali R, Azcarate-Peril M.A. 2017. A comparison of sequencing platforms and bioinformatics pipelines for compositional analysis of the gut microbiome. BMC Microbiology 17: 1-16.
| Class | Genus | Estimate | Std. Error | t value | Pr(>|t|) (p-value) | |
|---|---|---|---|---|---|---|
| Otu0181 | SAR202_clade | SAR202_clade_ge | -0.1713584 | 0.3794814 | -0.4515595 | 0.6704985 |
| Otu1579 | SAR202_clade | SAR202_clade_ge | -0.0073322 | 0.0162544 | -0.4510917 | 0.6708136 |
| Otu1149 | SAR202_clade | SAR202_clade_ge | -0.0035352 | 0.0082586 | -0.4280621 | 0.6864177 |
| Otu4286 | SAR202_clade | SAR202_clade_ge | -0.0035352 | 0.0082586 | -0.4280621 | 0.6864177 |
| Otu1064 | SAR202_clade | SAR202_clade_ge | -0.0027496 | 0.0165362 | -0.1662767 | 0.8744539 |
| Otu2632 | SAR202_clade | SAR202_clade_ge | -0.0023568 | 0.0055057 | -0.4280621 | 0.6864177 |
| Otu4287 | SAR202_clade | SAR202_clade_ge | -0.0011784 | 0.0027529 | -0.4280621 | 0.6864177 |
| Otu2381 | Anaerolineae | uncultured | -0.0005237 | 0.0056008 | -0.0935100 | 0.9291298 |
| Otu2592 | SAR202_clade | SAR202_clade_ge | -0.0005237 | 0.0056008 | -0.0935100 | 0.9291298 |
| Otu2591 | SAR202_clade | SAR202_clade_ge | -0.0002619 | 0.0028004 | -0.0935100 | 0.9291298 |
| Otu1577 | SAR202_clade | SAR202_clade_ge | 0.0001637 | 0.0036177 | 0.0452401 | 0.9656672 |
| Otu3712 | Anaerolineae | uncultured | 0.0008511 | 0.0055928 | 0.1521723 | 0.8850009 |
| Otu3607 | Anaerolineae | uncultured | 0.0034043 | 0.0023533 | 1.4465667 | 0.2076595 |
| Otu2790 | Anaerolineae | Thermomarinilinea | 0.0034043 | 0.0023533 | 1.4465667 | 0.2076595 |
| Otu3623 | Anaerolineae | Thermomarinilinea | 0.0036007 | 0.0053694 | 0.6705821 | 0.5322101 |
| Otu4340 | Anaerolineae | Anaerolineaceae_unclassified | 0.0068085 | 0.0047067 | 1.4465667 | 0.2076595 |
| Otu2789 | Anaerolineae | Thermomarinilinea | 0.0068085 | 0.0047067 | 1.4465667 | 0.2076595 |
| Otu1558 | Anaerolineae | Anaerolineaceae_unclassified | 0.0070049 | 0.0049223 | 1.4231039 | 0.2139907 |
| Otu1863 | Anaerolineae | Pelolinea | 0.0079214 | 0.0046360 | 1.7086714 | 0.1482107 |
| Otu3589 | Anaerolineae | uncultured | 0.0136170 | 0.0094133 | 1.4465667 | 0.2076595 |
| Otu1419 | Anaerolineae | Thermomarinilinea | 0.0136170 | 0.0094133 | 1.4465667 | 0.2076595 |
| Otu2497 | Anaerolineae | Pelolinea | 0.0136170 | 0.0094133 | 1.4465667 | 0.2076595 |
| Otu1147 | Anaerolineae | Thermomarinilinea | 0.0146645 | 0.0110438 | 1.3278449 | 0.2416173 |
| Otu1983 | Anaerolineae | Thermomarinilinea | 0.0158101 | 0.0123144 | 1.2838745 | 0.2554599 |
| Otu1246 | Anaerolineae | Thermomarinilinea | 0.0158429 | 0.0092720 | 1.7086714 | 0.1482107 |
| Otu1851 | Anaerolineae | Anaerolineaceae_unclassified | 0.0170213 | 0.0117667 | 1.4465667 | 0.2076595 |
| Otu0662 | Anaerolineae | Thermomarinilinea | 0.0340426 | 0.0235333 | 1.4465667 | 0.2076595 |
| Otu0551 | Anaerolineae | Thermomarinilinea | 0.0365957 | 0.0465283 | 0.7865264 | 0.4671821 |
| Otu1028 | Anaerolineae | Thermomarinilinea | 0.0374468 | 0.0258867 | 1.4465667 | 0.2076595 |
| Otu0607 | Anaerolineae | Thermomarinilinea | 0.0389853 | 0.0438394 | 0.8892756 | 0.4145865 |
| Otu0799 | Anaerolineae | Pelolinea | 0.0477578 | 0.0280660 | 1.7016226 | 0.1495636 |
| Otu0217 | Anaerolineae | Anaerolineaceae_unclassified | 0.1946645 | 0.2102439 | 0.9258985 | 0.3969899 |
| Otu0215 | Anaerolineae | Thermomarinilinea | 0.4527660 | 0.3129935 | 1.4465667 | 0.2076595 |
| Otu0195 | Anaerolineae | Anaerolineaceae_unclassified | 0.5344681 | 0.3694735 | 1.4465667 | 0.2076595 |
| Class | Genus | Estimate | Std. Error | t value | Pr(>|t|) (p-value) | |
|---|---|---|---|---|---|---|
| Asv1886 | D_2__SAR202 clade | D_5__ | -0.3397709 | 0.2301978 | -1.4759954 | 0.1999714 |
| Asv800 | D_2__SAR202 clade | D_5__ | -0.1327332 | 0.3683890 | -0.3603073 | 0.7333378 |
| Asv1266 | D_2__SAR202 clade | D_5__ | -0.0329951 | 0.0770801 | -0.4280621 | 0.6864177 |
| Asv1289 | D_2__Anaerolineae | D_5__uncultured | -0.0164975 | 0.0385401 | -0.4280621 | 0.6864177 |
| Asv1979 | D_2__Anaerolineae | D_5__uncultured | -0.0057610 | 0.0616089 | -0.0935100 | 0.9291298 |
| Asv1144 | D_2__SAR202 clade | D_5__ | -0.0039280 | 0.1354082 | -0.0290085 | 0.9779801 |
| Asv341 | D_2__JG30-KF-CM66 | D_5__ | -0.0035352 | 0.0082586 | -0.4280621 | 0.6864177 |
| Asv1862 | D_2__Anaerolineae | D_5__uncultured | -0.0034043 | 0.0364052 | -0.0935100 | 0.9291298 |
| Asv1260 | D_2__Anaerolineae | D_5__uncultured | -0.0007856 | 0.0084012 | -0.0935100 | 0.9291298 |
| Asv2081 | D_2__Anaerolineae | D_5__uncultured | 0.0011129 | 0.0027583 | 0.4034830 | 0.7032691 |
| Asv2034 | D_2__SAR202 clade | D_5__ | 0.0038298 | 0.0251674 | 0.1521723 | 0.8850009 |
| Asv1142 | D_2__JG30-KF-CM66 | D_5__ | 0.0057610 | 0.0801057 | 0.0719180 | 0.9454553 |
| Asv1046 | D_2__Anaerolineae | D_5__uncultured | 0.0111293 | 0.0275831 | 0.4034830 | 0.7032691 |
| Asv2247 | D_2__JG30-KF-CM66 | D_5__ | 0.0126023 | 0.0187931 | 0.6705821 | 0.5322101 |
| Asv400 | D_2__uncultured | D_5__ | 0.0136170 | 0.0094133 | 1.4465667 | 0.2076595 |
| Asv496 | D_2__Dehalococcoidia | NA | 0.0136170 | 0.0094133 | 1.4465667 | 0.2076595 |
| Asv2063 | D_2__Anaerolineae | NA | 0.0189198 | 0.0468912 | 0.4034830 | 0.7032691 |
| Asv134 | D_2__Anaerolineae | NA | 0.0234043 | 0.0349014 | 0.6705821 | 0.5322101 |
| Asv1473 | D_2__SAR202 clade | NA | 0.0238298 | 0.0164733 | 1.4465667 | 0.2076595 |
| Asv1794 | D_2__Anaerolineae | D_5__uncultured | 0.0238298 | 0.0164733 | 1.4465667 | 0.2076595 |
| Asv1234 | D_2__SAR202 clade | D_5__ | 0.0272340 | 0.0188267 | 1.4465667 | 0.2076595 |
| Asv477 | D_2__Anaerolineae | D_5__uncultured | 0.0288052 | 0.0429556 | 0.6705821 | 0.5322101 |
| Asv590 | D_2__Anaerolineae | D_5__uncultured | 0.0306383 | 0.0211800 | 1.4465667 | 0.2076595 |
| Asv1003 | D_2__SAR202 clade | D_5__ | 0.0306383 | 0.0211800 | 1.4465667 | 0.2076595 |
| Asv1282 | D_2__Anaerolineae | D_5__uncultured | 0.0340426 | 0.0235333 | 1.4465667 | 0.2076595 |
| Asv490 | D_2__Anaerolineae | D_5__uncultured | 0.0396072 | 0.0859823 | 0.4606434 | 0.6643958 |
| Asv1664 | D_2__Anaerolineae | D_5__uncultured | 0.0414075 | 0.0617486 | 0.6705821 | 0.5322101 |
| Asv1939 | D_2__Anaerolineae | D_5__Longilinea | 0.0418331 | 0.0911012 | 0.4591934 | 0.6653680 |
| Asv1163 | D_2__Anaerolineae | D_5__uncultured | 0.0476596 | 0.0329467 | 1.4465667 | 0.2076595 |
| Asv473 | D_2__Anaerolineae | D_5__uncultured | 0.0522095 | 0.0778570 | 0.6705821 | 0.5322101 |
| Asv2315 | D_2__Anaerolineae | D_5__uncultured | 0.0578723 | 0.0400067 | 1.4465667 | 0.2076595 |
| Asv1693 | D_2__Anaerolineae | D_5__uncultured | 0.0583633 | 0.0676342 | 0.8629259 | 0.4276204 |
| Asv555 | D_2__Anaerolineae | D_5__uncultured | 0.0748936 | 0.0517734 | 1.4465667 | 0.2076595 |
| Asv1943 | D_2__Anaerolineae | D_5__uncultured | 0.0792144 | 0.1181278 | 0.6705821 | 0.5322101 |
| Asv428 | D_2__Anaerolineae | D_5__uncultured | 0.0955810 | 0.1216822 | 0.7854968 | 0.4677332 |
| Asv114 | D_2__JG30-KF-CM66 | D_5__ | 0.1054664 | 0.1601592 | 0.6585100 | 0.5393201 |
| Asv2324 | D_2__Anaerolineae | NA | 0.1089362 | 0.0753067 | 1.4465667 | 0.2076595 |
| Asv1423 | D_2__Anaerolineae | D_5__Longilinea | 0.1123404 | 0.0776600 | 1.4465667 | 0.2076595 |
| Asv1505 | D_2__Anaerolineae | D_5__uncultured | 0.1123404 | 0.0776600 | 1.4465667 | 0.2076595 |
| Asv271 | D_2__Anaerolineae | D_5__uncultured | 0.1361702 | 0.0941334 | 1.4465667 | 0.2076595 |
| Asv208 | D_2__Anaerolineae | D_5__uncultured | 0.1468412 | 0.1522869 | 0.9642409 | 0.3792105 |
| Asv1095 | D_2__Anaerolineae | NA | 0.1634043 | 0.1129601 | 1.4465667 | 0.2076595 |
| Asv161 | D_2__Anaerolineae | D_5__uncultured | 0.1668085 | 0.1153134 | 1.4465667 | 0.2076595 |
| Asv1108 | D_2__Anaerolineae | D_5__uncultured | 0.1669722 | 0.1917355 | 0.8708462 | 0.4236697 |
| Asv408 | D_2__Anaerolineae | D_5__uncultured | 0.1859247 | 0.2109964 | 0.8811750 | 0.4185601 |
| Asv1071 | D_2__Anaerolineae | D_5__uncultured | 0.3438298 | 0.2376868 | 1.4465667 | 0.2076595 |
| Asv1749 | D_2__Anaerolineae | D_5__uncultured | 0.5208511 | 0.3600602 | 1.4465667 | 0.2076595 |
| Class | Genus | Estimate | Std. Error | t value | Pr(>|t|) (p-value) | |
|---|---|---|---|---|---|---|
| Otu0195 | Anaerolineae | Anaerolineaceae_unclassified | -0.1897296 | 0.3302310 | -0.5745358 | 0.5904867 |
| Otu0217 | Anaerolineae | Anaerolineaceae_unclassified | -0.1736086 | 0.1582994 | -1.0967102 | 0.3227568 |
| Otu0215 | Anaerolineae | Thermomarinilinea | -0.1607263 | 0.2797499 | -0.5745358 | 0.5904867 |
| Otu0181 | SAR202_clade | SAR202_clade_ge | -0.0520091 | 0.2990620 | -0.1739074 | 0.8687601 |
| Otu0607 | Anaerolineae | Thermomarinilinea | -0.0317385 | 0.0336870 | -0.9421584 | 0.3893700 |
| Otu0551 | Anaerolineae | Thermomarinilinea | -0.0269047 | 0.0362727 | -0.7417334 | 0.4915977 |
| Otu0799 | Anaerolineae | Pelolinea | -0.0200649 | 0.0258114 | -0.7773675 | 0.4721013 |
| Otu1028 | Anaerolineae | Thermomarinilinea | -0.0132932 | 0.0231372 | -0.5745358 | 0.5904867 |
| Otu0662 | Anaerolineae | Thermomarinilinea | -0.0120847 | 0.0210338 | -0.5745358 | 0.5904867 |
| Otu1983 | Anaerolineae | Thermomarinilinea | -0.0084593 | 0.0103315 | -0.8187858 | 0.4501563 |
| Otu1147 | Anaerolineae | Thermomarinilinea | -0.0084593 | 0.0092049 | -0.9189965 | 0.4002601 |
| Otu1246 | Anaerolineae | Thermomarinilinea | -0.0072508 | 0.0084400 | -0.8590967 | 0.4295406 |
| Otu1851 | Anaerolineae | Anaerolineaceae_unclassified | -0.0060423 | 0.0105169 | -0.5745358 | 0.5904867 |
| Otu1419 | Anaerolineae | Thermomarinilinea | -0.0048339 | 0.0084135 | -0.5745358 | 0.5904867 |
| Otu2497 | Anaerolineae | Pelolinea | -0.0048339 | 0.0084135 | -0.5745358 | 0.5904867 |
| Otu3589 | Anaerolineae | uncultured | -0.0048339 | 0.0084135 | -0.5745358 | 0.5904867 |
| Otu1558 | Anaerolineae | Anaerolineaceae_unclassified | -0.0036254 | 0.0042200 | -0.8590967 | 0.4295406 |
| Otu1863 | Anaerolineae | Pelolinea | -0.0036254 | 0.0042200 | -0.8590967 | 0.4295406 |
| Otu2789 | Anaerolineae | Thermomarinilinea | -0.0024169 | 0.0042068 | -0.5745358 | 0.5904867 |
| Otu4340 | Anaerolineae | Anaerolineaceae_unclassified | -0.0024169 | 0.0042068 | -0.5745358 | 0.5904867 |
| Otu3623 | Anaerolineae | Thermomarinilinea | -0.0024169 | 0.0042068 | -0.5745358 | 0.5904867 |
| Otu1064 | SAR202_clade | SAR202_clade_ge | -0.0020728 | 0.0128145 | -0.1617557 | 0.8778315 |
| Otu1579 | SAR202_clade | SAR202_clade_ge | -0.0012945 | 0.0128349 | -0.1008584 | 0.9235824 |
| Otu3712 | Anaerolineae | uncultured | -0.0012919 | 0.0043048 | -0.3001126 | 0.7761680 |
| Otu2790 | Anaerolineae | Thermomarinilinea | -0.0012085 | 0.0021034 | -0.5745358 | 0.5904867 |
| Otu3607 | Anaerolineae | uncultured | -0.0012085 | 0.0021034 | -0.5745358 | 0.5904867 |
| Otu1577 | SAR202_clade | SAR202_clade_ge | -0.0009643 | 0.0027703 | -0.3480925 | 0.7419469 |
| Otu2592 | SAR202_clade | SAR202_clade_ge | -0.0006367 | 0.0043341 | -0.1469078 | 0.8889445 |
| Otu2381 | Anaerolineae | uncultured | -0.0006367 | 0.0043341 | -0.1469078 | 0.8889445 |
| Otu1149 | SAR202_clade | SAR202_clade_ge | -0.0004881 | 0.0065115 | -0.0749568 | 0.9431557 |
| Otu4286 | SAR202_clade | SAR202_clade_ge | -0.0004881 | 0.0065115 | -0.0749568 | 0.9431557 |
| Otu2632 | SAR202_clade | SAR202_clade_ge | -0.0003254 | 0.0043410 | -0.0749568 | 0.9431557 |
| Otu2591 | SAR202_clade | SAR202_clade_ge | -0.0003184 | 0.0021670 | -0.1469078 | 0.8889445 |
| Otu4287 | SAR202_clade | SAR202_clade_ge | -0.0001627 | 0.0021705 | -0.0749568 | 0.9431557 |
| Class | Genus | Estimate | Std. Error | t value | Pr(>|t|) (p-value) | |
|---|---|---|---|---|---|---|
| Asv1749 | D_2__Anaerolineae | D_5__uncultured | -0.1848957 | 0.3218175 | -0.5745358 | 0.5904867 |
| Asv408 | D_2__Anaerolineae | D_5__uncultured | -0.1764060 | 0.1570152 | -1.1234959 | 0.3122557 |
| Asv1108 | D_2__Anaerolineae | D_5__uncultured | -0.1646951 | 0.1413959 | -1.1647802 | 0.2966535 |
| Asv208 | D_2__Anaerolineae | D_5__uncultured | -0.1439410 | 0.1112113 | -1.2943021 | 0.2521126 |
| Asv1071 | D_2__Anaerolineae | D_5__uncultured | -0.1220553 | 0.2124416 | -0.5745358 | 0.5904867 |
| Asv428 | D_2__Anaerolineae | D_5__uncultured | -0.1061416 | 0.0879361 | -1.2070308 | 0.2814027 |
| Asv114 | D_2__JG30-KF-CM66 | D_5__ | -0.0969221 | 0.1218861 | -0.7951862 | 0.4625657 |
| Asv800 | D_2__SAR202 clade | D_5__ | -0.0875482 | 0.2864534 | -0.3056281 | 0.7722028 |
| Asv161 | D_2__Anaerolineae | D_5__uncultured | -0.0592150 | 0.1030657 | -0.5745358 | 0.5904867 |
| Asv1095 | D_2__Anaerolineae | NA | -0.0580065 | 0.1009624 | -0.5745358 | 0.5904867 |
| Asv1943 | D_2__Anaerolineae | D_5__uncultured | -0.0531726 | 0.0925488 | -0.5745358 | 0.5904867 |
| Asv271 | D_2__Anaerolineae | D_5__uncultured | -0.0483387 | 0.0841353 | -0.5745358 | 0.5904867 |
| Asv1939 | D_2__Anaerolineae | D_5__Longilinea | -0.0476310 | 0.0688397 | -0.6919125 | 0.5198017 |
| Asv490 | D_2__Anaerolineae | D_5__uncultured | -0.0452141 | 0.0649448 | -0.6961928 | 0.5173357 |
| Asv1693 | D_2__Anaerolineae | D_5__uncultured | -0.0447133 | 0.0524914 | -0.8518223 | 0.4332067 |
| Asv1505 | D_2__Anaerolineae | D_5__uncultured | -0.0398795 | 0.0694116 | -0.5745358 | 0.5904867 |
| Asv1423 | D_2__Anaerolineae | D_5__Longilinea | -0.0398795 | 0.0694116 | -0.5745358 | 0.5904867 |
| Asv2324 | D_2__Anaerolineae | NA | -0.0386710 | 0.0673082 | -0.5745358 | 0.5904867 |
| Asv473 | D_2__Anaerolineae | D_5__uncultured | -0.0350456 | 0.0609981 | -0.5745358 | 0.5904867 |
| Asv1664 | D_2__Anaerolineae | D_5__uncultured | -0.0277948 | 0.0483778 | -0.5745358 | 0.5904867 |
| Asv555 | D_2__Anaerolineae | D_5__uncultured | -0.0265863 | 0.0462744 | -0.5745358 | 0.5904867 |
| Asv1144 | D_2__SAR202 clade | D_5__ | -0.0252671 | 0.1043155 | -0.2422180 | 0.8182327 |
| Asv2063 | D_2__Anaerolineae | NA | -0.0205440 | 0.0357575 | -0.5745358 | 0.5904867 |
| Asv2315 | D_2__Anaerolineae | D_5__uncultured | -0.0205440 | 0.0357575 | -0.5745358 | 0.5904867 |
| Asv477 | D_2__Anaerolineae | D_5__uncultured | -0.0193355 | 0.0336541 | -0.5745358 | 0.5904867 |
| Asv1163 | D_2__Anaerolineae | D_5__uncultured | -0.0169186 | 0.0294474 | -0.5745358 | 0.5904867 |
| Asv134 | D_2__Anaerolineae | NA | -0.0157101 | 0.0273440 | -0.5745358 | 0.5904867 |
| Asv1282 | D_2__Anaerolineae | D_5__uncultured | -0.0120847 | 0.0210338 | -0.5745358 | 0.5904867 |
| Asv1046 | D_2__Anaerolineae | D_5__uncultured | -0.0120847 | 0.0210338 | -0.5745358 | 0.5904867 |
| Asv1142 | D_2__JG30-KF-CM66 | D_5__ | -0.0112835 | 0.0618942 | -0.1823031 | 0.8625058 |
| Asv590 | D_2__Anaerolineae | D_5__uncultured | -0.0108762 | 0.0189304 | -0.5745358 | 0.5904867 |
| Asv1003 | D_2__SAR202 clade | D_5__ | -0.0108762 | 0.0189304 | -0.5745358 | 0.5904867 |
| Asv1234 | D_2__SAR202 clade | D_5__ | -0.0096677 | 0.0168271 | -0.5745358 | 0.5904867 |
| Asv1794 | D_2__Anaerolineae | D_5__uncultured | -0.0084593 | 0.0147237 | -0.5745358 | 0.5904867 |
| Asv1473 | D_2__SAR202 clade | NA | -0.0084593 | 0.0147237 | -0.5745358 | 0.5904867 |
| Asv2247 | D_2__JG30-KF-CM66 | D_5__ | -0.0084593 | 0.0147237 | -0.5745358 | 0.5904867 |
| Asv1979 | D_2__Anaerolineae | D_5__uncultured | -0.0070038 | 0.0476747 | -0.1469078 | 0.8889445 |
| Asv2034 | D_2__SAR202 clade | D_5__ | -0.0058137 | 0.0193716 | -0.3001126 | 0.7761680 |
| Asv496 | D_2__Dehalococcoidia | NA | -0.0048339 | 0.0084135 | -0.5745358 | 0.5904867 |
| Asv400 | D_2__uncultured | D_5__ | -0.0048339 | 0.0084135 | -0.5745358 | 0.5904867 |
| Asv1266 | D_2__SAR202 clade | D_5__ | -0.0045554 | 0.0607736 | -0.0749568 | 0.9431557 |
| Asv1862 | D_2__Anaerolineae | D_5__uncultured | -0.0041386 | 0.0281714 | -0.1469078 | 0.8889445 |
| Asv1289 | D_2__Anaerolineae | D_5__uncultured | -0.0022777 | 0.0303868 | -0.0749568 | 0.9431557 |
| Asv2081 | D_2__Anaerolineae | D_5__uncultured | -0.0012085 | 0.0021034 | -0.5745358 | 0.5904867 |
| Asv1260 | D_2__Anaerolineae | D_5__uncultured | -0.0009551 | 0.0065011 | -0.1469078 | 0.8889445 |
| Asv341 | D_2__JG30-KF-CM66 | D_5__ | -0.0004881 | 0.0065115 | -0.0749568 | 0.9431557 |
| Asv1886 | D_2__SAR202 clade | D_5__ | 0.1614351 | 0.2011515 | 0.8025547 | 0.4586642 |
| Class | Genus | Estimate | Std. Error | t value | Pr(>|t|) (p-value) | |
|---|---|---|---|---|---|---|
| Otu0181 | SAR202_clade | SAR202_clade_ge | -2.4575728 | 3.3035421 | -0.7439205 | 0.4903847 |
| Otu0217 | Anaerolineae | Anaerolineaceae_unclassified | -0.9242423 | 2.0042270 | -0.4611465 | 0.6640586 |
| Otu0607 | Anaerolineae | Thermomarinilinea | -0.1124721 | 0.4212890 | -0.2669714 | 0.8001524 |
| Otu1064 | SAR202_clade | SAR202_clade_ge | -0.0796436 | 0.1448049 | -0.5500061 | 0.6059809 |
| Otu1579 | SAR202_clade | SAR202_clade_ge | -0.0796436 | 0.1448049 | -0.5500061 | 0.6059809 |
| Otu1149 | SAR202_clade | SAR202_clade_ge | -0.0341330 | 0.0740614 | -0.4608737 | 0.6642415 |
| Otu4286 | SAR202_clade | SAR202_clade_ge | -0.0341330 | 0.0740614 | -0.4608737 | 0.6642415 |
| Otu0551 | Anaerolineae | Thermomarinilinea | -0.0280166 | 0.4433827 | -0.0631883 | 0.9520649 |
| Otu2381 | Anaerolineae | uncultured | -0.0227553 | 0.0493743 | -0.4608737 | 0.6642415 |
| Otu2592 | SAR202_clade | SAR202_clade_ge | -0.0227553 | 0.0493743 | -0.4608737 | 0.6642415 |
| Otu2632 | SAR202_clade | SAR202_clade_ge | -0.0227553 | 0.0493743 | -0.4608737 | 0.6642415 |
| Otu3712 | Anaerolineae | uncultured | -0.0227553 | 0.0493743 | -0.4608737 | 0.6642415 |
| Otu1577 | SAR202_clade | SAR202_clade_ge | -0.0227553 | 0.0309087 | -0.7362104 | 0.4946701 |
| Otu2591 | SAR202_clade | SAR202_clade_ge | -0.0113777 | 0.0246871 | -0.4608737 | 0.6642415 |
| Otu4287 | SAR202_clade | SAR202_clade_ge | -0.0113777 | 0.0246871 | -0.4608737 | 0.6642415 |
| Otu3623 | Anaerolineae | Thermomarinilinea | 0.0032080 | 0.0503917 | 0.0636606 | 0.9517072 |
| Otu3607 | Anaerolineae | uncultured | 0.0552843 | 0.0049066 | 11.2673382 | 0.0000962 |
| Otu2790 | Anaerolineae | Thermomarinilinea | 0.0552843 | 0.0049066 | 11.2673382 | 0.0000962 |
| Otu1558 | Anaerolineae | Anaerolineaceae_unclassified | 0.0584922 | 0.0454851 | 1.2859653 | 0.2547855 |
| Otu1863 | Anaerolineae | Pelolinea | 0.0991909 | 0.0280249 | 3.5393861 | 0.0165734 |
| Otu2789 | Anaerolineae | Thermomarinilinea | 0.1105686 | 0.0098132 | 11.2673382 | 0.0000962 |
| Otu4340 | Anaerolineae | Anaerolineaceae_unclassified | 0.1105686 | 0.0098132 | 11.2673382 | 0.0000962 |
| Otu1983 | Anaerolineae | Thermomarinilinea | 0.1185885 | 0.1161660 | 1.0208533 | 0.3541507 |
| Otu1147 | Anaerolineae | Thermomarinilinea | 0.1203422 | 0.1022047 | 1.1774631 | 0.2920002 |
| Otu1246 | Anaerolineae | Thermomarinilinea | 0.1983818 | 0.0560498 | 3.5393861 | 0.0165734 |
| Otu3589 | Anaerolineae | uncultured | 0.2211371 | 0.0196264 | 11.2673382 | 0.0000962 |
| Otu1419 | Anaerolineae | Thermomarinilinea | 0.2211371 | 0.0196264 | 11.2673382 | 0.0000962 |
| Otu2497 | Anaerolineae | Pelolinea | 0.2211371 | 0.0196264 | 11.2673382 | 0.0000962 |
| Otu1851 | Anaerolineae | Anaerolineaceae_unclassified | 0.2764214 | 0.0245330 | 11.2673382 | 0.0000962 |
| Otu0662 | Anaerolineae | Thermomarinilinea | 0.5528428 | 0.0490660 | 11.2673382 | 0.0000962 |
| Otu1028 | Anaerolineae | Thermomarinilinea | 0.6081271 | 0.0539726 | 11.2673382 | 0.0000962 |
| Otu0799 | Anaerolineae | Pelolinea | 0.6618074 | 0.1140109 | 5.8047714 | 0.0021396 |
| Otu0215 | Anaerolineae | Thermomarinilinea | 7.3528091 | 0.6525773 | 11.2673382 | 0.0000962 |
| Otu0195 | Anaerolineae | Anaerolineaceae_unclassified | 8.6796317 | 0.7703356 | 11.2673382 | 0.0000962 |
| Class | Genus | Estimate | Std. Error | t value | Pr(>|t|) (p-value) | |
|---|---|---|---|---|---|---|
| Asv800 | D_2__SAR202 clade | D_5__ | -2.9695672 | 3.0816822 | -0.9636189 | 0.3794937 |
| Asv1886 | D_2__SAR202 clade | D_5__ | -2.2758300 | 2.2620814 | -1.0060778 | 0.3605552 |
| Asv1108 | D_2__Anaerolineae | D_5__uncultured | -1.2574021 | 1.7629163 | -0.7132512 | 0.5075876 |
| Asv408 | D_2__Anaerolineae | D_5__uncultured | -1.1503411 | 1.9735619 | -0.5828756 | 0.5852742 |
| Asv208 | D_2__Anaerolineae | D_5__uncultured | -1.1127006 | 1.4059595 | -0.7914172 | 0.4645707 |
| Asv428 | D_2__Anaerolineae | D_5__uncultured | -1.0569670 | 1.0591512 | -0.9979378 | 0.3641245 |
| Asv1144 | D_2__SAR202 clade | D_5__ | -0.6485262 | 1.1827890 | -0.5483025 | 0.6070659 |
| Asv114 | D_2__JG30-KF-CM66 | D_5__ | -0.6042138 | 1.4769566 | -0.4090938 | 0.6994049 |
| Asv1939 | D_2__Anaerolineae | D_5__Longilinea | -0.5119943 | 0.8044173 | -0.6364785 | 0.5524571 |
| Asv490 | D_2__Anaerolineae | D_5__uncultured | -0.4892390 | 0.7585529 | -0.6449637 | 0.5473730 |
| Asv1142 | D_2__JG30-KF-CM66 | D_5__ | -0.3868402 | 0.6996938 | -0.5528707 | 0.6041591 |
| Asv1266 | D_2__SAR202 clade | D_5__ | -0.3185743 | 0.6912399 | -0.4608737 | 0.6642415 |
| Asv1979 | D_2__Anaerolineae | D_5__uncultured | -0.2503083 | 0.5431170 | -0.4608737 | 0.6642415 |
| Asv2063 | D_2__Anaerolineae | NA | -0.1934201 | 0.4196813 | -0.4608737 | 0.6642415 |
| Asv1289 | D_2__Anaerolineae | D_5__uncultured | -0.1592871 | 0.3456199 | -0.4608737 | 0.6642415 |
| Asv1862 | D_2__Anaerolineae | D_5__uncultured | -0.1479095 | 0.3209328 | -0.4608737 | 0.6642415 |
| Asv1046 | D_2__Anaerolineae | D_5__uncultured | -0.1137765 | 0.2468714 | -0.4608737 | 0.6642415 |
| Asv2034 | D_2__SAR202 clade | D_5__ | -0.1023989 | 0.2221842 | -0.4608737 | 0.6642415 |
| Asv1693 | D_2__Anaerolineae | D_5__uncultured | -0.0964323 | 0.6505274 | -0.1482371 | 0.8879484 |
| Asv1260 | D_2__Anaerolineae | D_5__uncultured | -0.0341330 | 0.0740614 | -0.4608737 | 0.6642415 |
| Asv341 | D_2__JG30-KF-CM66 | D_5__ | -0.0341330 | 0.0740614 | -0.4608737 | 0.6642415 |
| Asv2081 | D_2__Anaerolineae | D_5__uncultured | -0.0113777 | 0.0246871 | -0.4608737 | 0.6642415 |
| Asv2247 | D_2__JG30-KF-CM66 | D_5__ | 0.0112279 | 0.1763709 | 0.0636606 | 0.9517072 |
| Asv134 | D_2__Anaerolineae | NA | 0.0208518 | 0.3275459 | 0.0636606 | 0.9517072 |
| Asv477 | D_2__Anaerolineae | D_5__uncultured | 0.0256637 | 0.4031335 | 0.0636606 | 0.9517072 |
| Asv1664 | D_2__Anaerolineae | D_5__uncultured | 0.0368916 | 0.5795044 | 0.0636606 | 0.9517072 |
| Asv473 | D_2__Anaerolineae | D_5__uncultured | 0.0465155 | 0.7306794 | 0.0636606 | 0.9517072 |
| Asv1943 | D_2__Anaerolineae | D_5__uncultured | 0.0705752 | 1.1086170 | 0.0636606 | 0.9517072 |
| Asv400 | D_2__uncultured | D_5__ | 0.2211371 | 0.0196264 | 11.2673382 | 0.0000962 |
| Asv496 | D_2__Dehalococcoidia | NA | 0.2211371 | 0.0196264 | 11.2673382 | 0.0000962 |
| Asv1473 | D_2__SAR202 clade | NA | 0.3869900 | 0.0343462 | 11.2673382 | 0.0000962 |
| Asv1794 | D_2__Anaerolineae | D_5__uncultured | 0.3869900 | 0.0343462 | 11.2673382 | 0.0000962 |
| Asv1234 | D_2__SAR202 clade | D_5__ | 0.4422742 | 0.0392528 | 11.2673382 | 0.0000962 |
| Asv590 | D_2__Anaerolineae | D_5__uncultured | 0.4975585 | 0.0441594 | 11.2673382 | 0.0000962 |
| Asv1003 | D_2__SAR202 clade | D_5__ | 0.4975585 | 0.0441594 | 11.2673382 | 0.0000962 |
| Asv1282 | D_2__Anaerolineae | D_5__uncultured | 0.5528428 | 0.0490660 | 11.2673382 | 0.0000962 |
| Asv1163 | D_2__Anaerolineae | D_5__uncultured | 0.7739799 | 0.0686923 | 11.2673382 | 0.0000962 |
| Asv2315 | D_2__Anaerolineae | D_5__uncultured | 0.9398327 | 0.0834121 | 11.2673382 | 0.0000962 |
| Asv555 | D_2__Anaerolineae | D_5__uncultured | 1.2162541 | 0.1079451 | 11.2673382 | 0.0000962 |
| Asv2324 | D_2__Anaerolineae | NA | 1.7690969 | 0.1570111 | 11.2673382 | 0.0000962 |
| Asv1423 | D_2__Anaerolineae | D_5__Longilinea | 1.8243812 | 0.1619177 | 11.2673382 | 0.0000962 |
| Asv1505 | D_2__Anaerolineae | D_5__uncultured | 1.8243812 | 0.1619177 | 11.2673382 | 0.0000962 |
| Asv271 | D_2__Anaerolineae | D_5__uncultured | 2.2113711 | 0.1962638 | 11.2673382 | 0.0000962 |
| Asv1095 | D_2__Anaerolineae | NA | 2.6536454 | 0.2355166 | 11.2673382 | 0.0000962 |
| Asv161 | D_2__Anaerolineae | D_5__uncultured | 2.7089297 | 0.2404232 | 11.2673382 | 0.0000962 |
| Asv1071 | D_2__Anaerolineae | D_5__uncultured | 5.5837121 | 0.4955662 | 11.2673382 | 0.0000962 |
| Asv1749 | D_2__Anaerolineae | D_5__uncultured | 8.4584946 | 0.7507092 | 11.2673382 | 0.0000962 |
Comment on the emergence of microbial life and the evolution of Earth systems
Earth is over 4500 years, however, life on Earth is believed to have emerged about 3900 million years ago and it has gone through and has survived extinction periods such as massive meteorite bombardments, hot ocean “bottlenecks”, and world-wide glaciation. During these periods such as hot ocean bottlenecks, only hyperthermophiles (or lithotrophs present in the deep Earth crust) were able to survive, which make it possible that early life diversified near hypothermal vents (or came from outer space possibly Mars), where photosynthetic organisms as well as housekeeping proteins and biochemical processes were developed. The development of an-oxygenic photosynthesis and then oxygenic photosynthesis allowed for the escape of life from hydrothermal settings, and its expansion to new environments. Most of the biochemical pathways that sustain the biosphere now had evolved by about 3500 million years ago.